[Patch v3] LocationHook with new Quadtree

newer
[Patch v4] LocationHook with new...

GerdP

12 Jan 2012 12 Jan '12

8:46 p.m.

Hi, now complete and based on r2167. Gerd http://gis.638310.n2.nabble.com/file/n7181669/locationHook_speedup_v3.patch locationHook_speedup_v3.patch -- View this message in context: http://gis.638310.n2.nabble.com/Patch-v3-LocationHook-with-new-Quadtree-tp71... Sent from the Mkgmap Development mailing list archive at Nabble.com.

Show replies by date

Hi Gerd, some ideas and questions: * BoundaryQuadTree.Node.add always performs a costly intersect of the area, also if the area is completely within the bbox. Maybe it's better first to check if the bbox contains the area (or area.getBounds()). * I don't understand the following part in Node.add: --- // optimization: don't add equal areas, only add the tags // we test only against the last element because that is likely // to match if (numNodes > 0 && a.equals(nodes.get(numNodes-1))){ addMissingTags(nodes.get(numNodes-1).tags, bTags); return; } --- a is an area and nodes.get returns a NodeElem so equals will always be false. I think you want to compare to nodes.get(..).area but I do not understand that either. Why should two areas be equal? * LocationHook.mkgmapTagsArray starts with an empty string element. I don't like that. * for (int i = 12; i >= 1; --i){ if (elem.getTag(mkgmapTagsArray[i] ) != null) res |= (1 << i); } => for (int i = 0; i mkgmapTagsArray.length; i++) { if (elem.getTag(mkgmapTagsArray[i] ) != null) res |= (1 << i); } Counting down is quite unusual and should only be done if there is a real reason for it. * I don't understand why you need a merge() method. Could you explain what you are doing in this method and why it's required? WanMil

...

Hi,

now complete and based on r2167.

Gerd

http://gis.638310.n2.nabble.com/file/n7181669/locationHook_speedup_v3.patch locationHook_speedup_v3.patch

-- View this message in context: http://gis.638310.n2.nabble.com/Patch-v3-LocationHook-with-new-Quadtree-tp71... Sent from the Mkgmap Development mailing list archive at Nabble.com. _______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

Gerd Petermann

9:41 p.m.

Hi WanMil, thanks for the quick response.

...

some ideas and questions:

* BoundaryQuadTree.Node.add always performs a costly intersect of the area, also if the area is completely within the bbox. Maybe it's better first to check if the bbox contains the area (or area.getBounds()). Good point. I'll try that.

...

* I don't understand the following part in Node.add: --- // optimization: don't add equal areas, only add the tags // we test only against the last element because that is likely // to match if (numNodes > 0 && a.equals(nodes.get(numNodes-1))){ addMissingTags(nodes.get(numNodes-1).tags, bTags); return; } --- a is an area and nodes.get returns a NodeElem so equals will always be false. I think you want to compare to nodes.get(..).area but I do not understand that either. Why should two areas be equal?

argh! This error was introduced while I cleaned the code :-( Of course I want to compare the areas. We add all boundaries to the tree, even those with level=2. Since the boundary list is sorted so that they appear last, it is very likely that the area completely covers the bbox of the tree. In this case the area will be the bbox.

...

* LocationHook.mkgmapTagsArray starts with an empty string element. I don't like that.

* for (int i = 12; i >= 1; --i){ if (elem.getTag(mkgmapTagsArray[i] ) != null) res |= (1 << i); }

=>

for (int i = 0; i mkgmapTagsArray.length; i++) { if (elem.getTag(mkgmapTagsArray[i] ) != null) res |= (1 << i); }

Counting down is quite unusual and should only be done if there is a real reason for it.

You are right, there is no longer a reason for it. I'll change that. I used this loop together with a bitmask and a call to Integer.highestOneBit()

...

* I don't understand why you need a merge() method. Could you explain what you are doing in this method and why it's required?

The get method() of the tree returns the data for the first area that contains the coord. This area should contain all tags of the area itself plus those from areas intersecting it. Maybe this is not correct? Ciao, Gerd

WanMil

9:52 p.m.

...

Hi WanMil,

thanks for the quick response.

...
some ideas and questions:

* BoundaryQuadTree.Node.add always performs a costly intersect of the area, also if the area is completely within the bbox. Maybe it's better first to check if the bbox contains the area (or area.getBounds()). Good point. I'll try that.

...
* I don't understand the following part in Node.add: --- // optimization: don't add equal areas, only add the tags // we test only against the last element because that is likely // to match if (numNodes > 0 && a.equals(nodes.get(numNodes-1))){ addMissingTags(nodes.get(numNodes-1).tags, bTags); return; } --- a is an area and nodes.get returns a NodeElem so equals will always be false. I think you want to compare to nodes.get(..).area but I do not understand that either. Why should two areas be equal?

argh! This error was introduced while I cleaned the code :-( Of course I want to compare the areas. We add all boundaries to the tree, even those with level=2. Since the boundary list is sorted so that they appear last, it is very likely that the area completely covers the bbox of the tree. In this case the area will be the bbox.

Is that really a performance improvement? admin_level=2 boundaries are splitted anyhow during creation of the tree. An area that is equal to a rectangle can perform a very easy and quick contains test. So checking for equality probably will be more costly than leaving it as it is?

...

...
* LocationHook.mkgmapTagsArray starts with an empty string element. I don't like that.

* for (int i = 12; i >= 1; --i){ if (elem.getTag(mkgmapTagsArray[i] ) != null) res |= (1 << i); }

=>

for (int i = 0; i mkgmapTagsArray.length; i++) { if (elem.getTag(mkgmapTagsArray[i] ) != null) res |= (1 << i); }

Counting down is quite unusual and should only be done if there is a real reason for it.

You are right, there is no longer a reason for it. I'll change that. I used this loop together with a bitmask and a call to Integer.highestOneBit()

...
* I don't understand why you need a merge() method. Could you explain what you are doing in this method and why it's required?

The get method() of the tree returns the data for the first area that contains the coord. This area should contain all tags of the area itself plus those from areas intersecting it. Maybe this is not correct?

Sounds wrong. If Area a1 is intersected by a2 only in 10% of its area then an element located in the 90% that does not intersect would be tagged with a1 and a2? WanMil

...

Ciao, Gerd

_______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

Gerd Petermann

10:13 p.m.

...

Date: Thu, 12 Jan 2012 22:52:18 +0100 From: wmgcnfg@web.de To: mkgmap-dev@lists.mkgmap.org.uk Subject: Re: [mkgmap-dev] [Patch v3] LocationHook with new Quadtree

...
Hi WanMil,

thanks for the quick response.

...
some ideas and questions:

* BoundaryQuadTree.Node.add always performs a costly intersect of the area, also if the area is completely within the bbox. Maybe it's better first to check if the bbox contains the area (or area.getBounds()). Good point. I'll try that.

...
* I don't understand the following part in Node.add: --- // optimization: don't add equal areas, only add the tags // we test only against the last element because that is likely // to match if (numNodes > 0 && a.equals(nodes.get(numNodes-1))){ addMissingTags(nodes.get(numNodes-1).tags, bTags); return; } --- a is an area and nodes.get returns a NodeElem so equals will always be false. I think you want to compare to nodes.get(..).area but I do not understand that either. Why should two areas be equal?

argh! This error was introduced while I cleaned the code :-( Of course I want to compare the areas. We add all boundaries to the tree, even those with level=2. Since the boundary list is sorted so that they appear last, it is very likely that the area completely covers the bbox of the tree. In this case the area will be the bbox.

Is that really a performance improvement? admin_level=2 boundaries are splitted anyhow during creation of the tree. An area that is equal to a rectangle can perform a very easy and quick contains test. So checking for equality probably will be more costly than leaving it as it is?

Hmm, it was an improvement at some stage before I optimized other parts, now it seems to decrease performance quite a lot. Remove it for now.

...

...
...
* LocationHook.mkgmapTagsArray starts with an empty string element. I don't like that.

Yes, looks strange, but without it the level value would not match the position in the array. I'll add a comment for this, okay? I also want to avoid calling getTag with an empty key, so I think we need for (int i=1;i < ...)

...

...
...
* for (int i = 12; i >= 1; --i){ if (elem.getTag(mkgmapTagsArray[i] ) != null) res |= (1 << i); }

=>

for (int i = 0; i mkgmapTagsArray.length; i++) { if (elem.getTag(mkgmapTagsArray[i] ) != null) res |= (1 << i); }

Counting down is quite unusual and should only be done if there is a real reason for it.

You are right, there is no longer a reason for it. I'll change that. I used this loop together with a bitmask and a call to Integer.highestOneBit()

...
* I don't understand why you need a merge() method. Could you explain what you are doing in this method and why it's required?

The get method() of the tree returns the data for the first area that contains the coord. This area should contain all tags of the area itself plus those from areas intersecting it. Maybe this is not correct?

Sounds wrong. If Area a1 is intersected by a2 only in 10% of its area then an element located in the 90% that does not intersect would be tagged with a1 and a2?

The merge() first creates a new nodeElem with the intersection, it merges the tags into this and subtracts the intersection from the others, so only the 10% part gets more tags. Thinking again about this I might not need both subtract() calls since I move the intersection part before the others .

WanMil

10:23 p.m.

...

...
...
...
* LocationHook.mkgmapTagsArray starts with an empty string element. I don't like that.

Yes, looks strange, but without it the level value would not match the position in the array. I'll add a comment for this, okay?

I also want to avoid calling getTag with an empty key, so I think we need for (int i=1;i < ...)

Every time I see an for (int i=1; i< ... my first impression is: Oh, this is wrong... So if the only reason is that otherwise the adminlevel does match then it sounds better to use adminlevel-1. Also note that admin_level=1 is not used in OSM. So you can remove that either. WanMil

Gerd Petermann

10:48 p.m.

Hi WanMil,

...

Date: Thu, 12 Jan 2012 23:23:54 +0100 From: wmgcnfg@web.de To: mkgmap-dev@lists.mkgmap.org.uk Subject: Re: [mkgmap-dev] [Patch v3] LocationHook with new Quadtree

...
...
...
...
* LocationHook.mkgmapTagsArray starts with an empty string element. I don't like that.

Yes, looks strange, but without it the level value would not match the position in the array. I'll add a comment for this, okay?

I also want to avoid calling getTag with an empty key, so I think we need for (int i=1;i < ...)

Every time I see an for (int i=1; i< ... my first impression is: Oh, this is wrong... So if the only reason is that otherwise the adminlevel does match then it sounds better to use adminlevel-1.

Indeed this was the reason why I kept the from-12-downto-1 loop. Nobody thinks this is a typo ;-) Anyway, I agree that it is best to remove the empty string from the array, start the loops with 0 and use xyz[admLevel-1]

...

Also note that admin_level=1 is not used in OSM. So you can remove that either.

Ok, I did not know this. Anyway, admLevel-2 is a bit too confusing in my eyes. By the way: I thought very long about the idea of the HashTable mkgmapTags in the trunk version. I guess it was used to allow easy removing of selected admin_levels or postal_code from processing ? Ciao, Gerd

...

WanMil _______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

WanMil

13 Jan 13 Jan

4:03 p.m.

...

...
Also note that admin_level=1 is not used in OSM. So you can remove that either.

Ok, I did not know this. Anyway, admLevel-2 is a bit too confusing in my eyes.

I wonder a bit: You try to remove as much as possible but leave the admin_level=1 tag which is definetly never used? In the end the array with tags must be well known at any place when mapping the original admin_level/postcode to the tags in the array. I think the array can be removed completely. It seems to confuse more than it helps in the LocationHook. It is used in the BoundaryQuadTree but it could be easily created just before calling the constructor.

...

By the way: I thought very long about the idea of the HashTable mkgmapTags in the trunk version. I guess it was used to allow easy removing of selected admin_levels or postal_code from processing ?

The original idea was that multiple tags might be used to map to the mkgmap internal tags. So not only admin_level=2 could map to mkgmap:admin_level2. But that was a nice idea which was never used. Maybe postal code tagging will require such a n:1 mapping because that tagging is not very consistent. WanMil

Gerd Petermann

5:11 p.m.

Hi WanMil, Date: Fri, 13 Jan 2012 17:03:58 +0100

...

From: wmgcnfg@web.de To: mkgmap-dev@lists.mkgmap.org.uk Subject: Re: [mkgmap-dev] [Patch v3] LocationHook with new Quadtree

...
...
Also note that admin_level=1 is not used in OSM. So you can remove that either.

Ok, I did not know this. Anyway, admLevel-2 is a bit too confusing in my eyes.

I wonder a bit: You try to remove as much as possible but leave the admin_level=1 tag which is definetly never used? In the end the array with tags must be well known at any place when mapping the original admin_level/postcode to the tags in the array.

I think the array can be removed completely. It seems to confuse more than it helps in the LocationHook. It is used in the BoundaryQuadTree but it could be easily created just before calling the constructor.

I did not think too much about that. I just found the original HashSet too confusing. The array was needed in an interims solution, and I used it also for printing debugging info. The content of the array is no longer performance critical. I also thought about placing it only in the quadtree source, what do you think about that?

...

...
By the way: I thought very long about the idea of the HashTable mkgmapTags in the trunk version. I guess it was used to allow easy removing of selected admin_levels or postal_code from processing ?

The original idea was that multiple tags might be used to map to the mkgmap internal tags. So not only admin_level=2 could map to mkgmap:admin_level2. But that was a nice idea which was never used. Maybe postal code tagging will require such a n:1 mapping because that tagging is not very consistent.

OK, that's understood. Maybe most of the code in LocationHook can be moved to BoundaryPreparer, but none of it is performance critical. Ciao, Gerd

WanMil

5:23 p.m.

...

...
...
...
* I don't understand why you need a merge() method. Could you explain what you are doing in this method and why it's required?

The get method() of the tree returns the data for the first area that contains the coord. This area should contain all tags of the area itself plus those from areas intersecting it. Maybe this is not correct?

Sounds wrong. If Area a1 is intersected by a2 only in 10% of its area then an element located in the 90% that does not intersect would be tagged with a1 and a2?

The merge() first creates a new nodeElem with the intersection, it merges the tags into this and subtracts the intersection from the others, so only the 10% part gets more tags. Thinking again about this I might not need both subtract() calls since I move the intersection part before the others .

I doubt that merge improves the performance. It breaks two areas into three probably with intersecting bounding boxes. So I would expect that you don't save much contains checks. But don't mind about my doubts: Did you measure a noticeable performance difference? WanMil

GerdP

6:08 p.m.

Hi WanMil, yes, it improves performance because the merge is done only once, and without the merge it would be needed to do more or less the same action for each single search. I see no better way to handle the problem with those level=7 areas. Ciao, Gerd WanMil wrote

...

...
...
...
...
* I don't understand why you need a merge() method. Could you

explain

...
what you are doing in this method and why it's required? The get method() of the tree returns the data for the first area that contains the coord. This area should contain all tags of the area itself plus those from areas intersecting it. Maybe this is not correct?

Sounds wrong. If Area a1 is intersected by a2 only in 10% of its area then an element located in the 90% that does not intersect would be tagged with a1 and a2?

The merge() first creates a new nodeElem with the intersection, it merges the tags into this and subtracts the intersection from the others, so only the 10% part gets more tags. Thinking again about this I might not need both subtract() calls since I move the intersection part before the others .

I doubt that merge improves the performance. It breaks two areas into three probably with intersecting bounding boxes. So I would expect that you don't save much contains checks. But don't mind about my doubts: Did you measure a noticeable performance difference?

WanMil _______________________________________________ mkgmap-dev mailing list mkgmap-dev@.org http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

-- View this message in context: http://gis.638310.n2.nabble.com/Patch-v3-LocationHook-with-new-Quadtree-tp71... Sent from the Mkgmap Development mailing list archive at Nabble.com.

WanMil

7:22 p.m.

Mmmh, I tested with an without merge using 66 tiles. LocationHook time with merge: 72,7s LocationHook time without merge: 55,6s The times are very different. So I assumed that the GC slowed down the merge variant so I compared the timings of each tile. One tile was 8s slower with the merge variant. So maybe the GC slowed it down. But most of the tiles were quicker with the non merge variant. Some are quicker with merging and some are rather equal. Maybe timings can be improved by changing the depth and the nodeNum constraint in the split method but merging does not seem to be an overall improvement. WanMil

...

Hi WanMil,

yes, it improves performance because the merge is done only once, and without the merge it would be needed to do more or less the same action for each single search. I see no better way to handle the problem with those level=7 areas.

Ciao, Gerd

WanMil wrote

...
...
...
...
...
* I don't understand why you need a merge() method. Could you

explain

...
what you are doing in this method and why it's required? The get method() of the tree returns the data for the first area that contains the coord. This area should contain all tags of the area itself plus those from areas intersecting it. Maybe this is not correct?

Sounds wrong. If Area a1 is intersected by a2 only in 10% of its area then an element located in the 90% that does not intersect would be tagged with a1 and a2?

The merge() first creates a new nodeElem with the intersection, it merges the tags into this and subtracts the intersection from the others, so only the 10% part gets more tags. Thinking again about this I might not need both subtract() calls since I move the intersection part before the others .

I doubt that merge improves the performance. It breaks two areas into three probably with intersecting bounding boxes. So I would expect that you don't save much contains checks. But don't mind about my doubts: Did you measure a noticeable performance difference?

WanMil _______________________________________________ mkgmap-dev mailing list mkgmap-dev@.org http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

-- View this message in context: http://gis.638310.n2.nabble.com/Patch-v3-LocationHook-with-new-Quadtree-tp71... Sent from the Mkgmap Development mailing list archive at Nabble.com. _______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

GerdP

7:37 p.m.

Hi WanMil, of course it is faster, but how do you solve the problem that the information is incomplete? Maybe I did not make that clear: Without the merge step, you have to travel through the nodeElems until information from all areas in combined. If you don't do this, you have the same result as in trunk when you remove an element from the list after it was first processed. Ciao, Gerd WanMil wrote

...

Mmmh, I tested with an without merge using 66 tiles.

LocationHook time with merge: 72,7s LocationHook time without merge: 55,6s

The times are very different. So I assumed that the GC slowed down the merge variant so I compared the timings of each tile. One tile was 8s slower with the merge variant. So maybe the GC slowed it down.

But most of the tiles were quicker with the non merge variant. Some are quicker with merging and some are rather equal.

Maybe timings can be improved by changing the depth and the nodeNum constraint in the split method but merging does not seem to be an overall improvement.

WanMil

...
Hi WanMil,

yes, it improves performance because the merge is done only once, and without the merge it would be needed to do more or less the same action for each single search. I see no better way to handle the problem with those level=7 areas.

Ciao, Gerd

WanMil wrote

...
...
...
...
> > * I don't understand why you need a merge() method. Could you explain > what you are doing in this method and why it's required? The get method() of the tree returns the data for the first area that contains the coord. This area should contain all tags of the area itself plus those from areas intersecting it. Maybe this is not correct?

Sounds wrong. If Area a1 is intersected by a2 only in 10% of its area then an element located in the 90% that does not intersect would be tagged with a1 and a2?

The merge() first creates a new nodeElem with the intersection, it merges the tags into this and subtracts the intersection from the others, so only the 10% part gets more tags. Thinking again about this I might not need both subtract() calls since I move the intersection part before the others .

I doubt that merge improves the performance. It breaks two areas into three probably with intersecting bounding boxes. So I would expect that you don't save much contains checks. But don't mind about my doubts: Did you measure a noticeable performance difference?

WanMil _______________________________________________ mkgmap-dev mailing list mkgmap-dev@.org http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

-- View this message in context: http://gis.638310.n2.nabble.com/Patch-v3-LocationHook-with-new-Quadtree-tp71... Sent from the Mkgmap Development mailing list archive at Nabble.com. _______________________________________________ mkgmap-dev mailing list mkgmap-dev@.org http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

_______________________________________________ mkgmap-dev mailing list mkgmap-dev@.org http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

-- View this message in context: http://gis.638310.n2.nabble.com/Patch-v3-LocationHook-with-new-Quadtree-tp71... Sent from the Mkgmap Development mailing list archive at Nabble.com.

WanMil

9:19 p.m.

Ok, I think I didn't understand your concept: The basis principle of the tree is that there are non intersecting areas only so that only one area can be found for one point. And each area contains all tags of all intersecting boundaries. I thought merging would be a performance thing only. The advantage of this is that you only need one match to get all. If we implement cutting ways by boundaries there is the disadvantage that we don't know in the LocationHook which tags are relevant and therefore ways might be splitted too much. WanMil

...

Hi WanMil,

of course it is faster, but how do you solve the problem that the information is incomplete? Maybe I did not make that clear: Without the merge step, you have to travel through the nodeElems until information from all areas in combined. If you don't do this, you have the same result as in trunk when you remove an element from the list after it was first processed.

Ciao, Gerd

WanMil wrote

...
Mmmh, I tested with an without merge using 66 tiles.

LocationHook time with merge: 72,7s LocationHook time without merge: 55,6s

The times are very different. So I assumed that the GC slowed down the merge variant so I compared the timings of each tile. One tile was 8s slower with the merge variant. So maybe the GC slowed it down.

But most of the tiles were quicker with the non merge variant. Some are quicker with merging and some are rather equal.

Maybe timings can be improved by changing the depth and the nodeNum constraint in the split method but merging does not seem to be an overall improvement.

WanMil

...
Hi WanMil,

yes, it improves performance because the merge is done only once, and without the merge it would be needed to do more or less the same action for each single search. I see no better way to handle the problem with those level=7 areas.

Ciao, Gerd

WanMil wrote

...
...
...
> > > > * I don't understand why you need a merge() method. Could you explain > > what you are doing in this method and why it's required? > The get method() of the tree returns the data for the first area that > contains the coord. > This area should contain all tags of the area itself plus those from > areas intersecting it. > Maybe this is not correct?

Sounds wrong. If Area a1 is intersected by a2 only in 10% of its area then an element located in the 90% that does not intersect would be tagged with a1 and a2?

The merge() first creates a new nodeElem with the intersection, it merges the tags into this and subtracts the intersection from the others, so only the 10% part gets more tags. Thinking again about this I might not need both subtract() calls since I move the intersection part before the others .

I doubt that merge improves the performance. It breaks two areas into three probably with intersecting bounding boxes. So I would expect that you don't save much contains checks. But don't mind about my doubts: Did you measure a noticeable performance difference?

WanMil _______________________________________________ mkgmap-dev mailing list mkgmap-dev@.org http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

-- View this message in context: http://gis.638310.n2.nabble.com/Patch-v3-LocationHook-with-new-Quadtree-tp71... Sent from the Mkgmap Development mailing list archive at Nabble.com. _______________________________________________ mkgmap-dev mailing list mkgmap-dev@.org http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

_______________________________________________ mkgmap-dev mailing list mkgmap-dev@.org http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

-- View this message in context: http://gis.638310.n2.nabble.com/Patch-v3-LocationHook-with-new-Quadtree-tp71... Sent from the Mkgmap Development mailing list archive at Nabble.com. _______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

Gerd Petermann

9:47 p.m.

Hi WanMil, okay, that's what it does and I think that's why it's much faster. The merge requires a few more intersect+substract calls, but it saves many area.contains() calls. Probably this should be added to the javadoc. Ciao, Gerd

...

Date: Fri, 13 Jan 2012 22:19:08 +0100 From: wmgcnfg@web.de To: mkgmap-dev@lists.mkgmap.org.uk Subject: Re: [mkgmap-dev] [Patch v3] LocationHook with new Quadtree

Ok, I think I didn't understand your concept: The basis principle of the tree is that there are non intersecting areas only so that only one area can be found for one point. And each area contains all tags of all intersecting boundaries. I thought merging would be a performance thing only.

The advantage of this is that you only need one match to get all. If we implement cutting ways by boundaries there is the disadvantage that we don't know in the LocationHook which tags are relevant and therefore ways might be splitted too much.

WanMil

...
Hi WanMil,

of course it is faster, but how do you solve the problem that the information is incomplete? Maybe I did not make that clear: Without the merge step, you have to travel through the nodeElems until information from all areas in combined. If you don't do this, you have the same result as in trunk when you remove an element from the list after it was first processed.

Ciao, Gerd

WanMil wrote

...
Mmmh, I tested with an without merge using 66 tiles.

LocationHook time with merge: 72,7s LocationHook time without merge: 55,6s

The times are very different. So I assumed that the GC slowed down the merge variant so I compared the timings of each tile. One tile was 8s slower with the merge variant. So maybe the GC slowed it down.

But most of the tiles were quicker with the non merge variant. Some are quicker with merging and some are rather equal.

Maybe timings can be improved by changing the depth and the nodeNum constraint in the split method but merging does not seem to be an overall improvement.

WanMil

...
Hi WanMil,

yes, it improves performance because the merge is done only once, and without the merge it would be needed to do more or less the same action for each single search. I see no better way to handle the problem with those level=7 areas.

Ciao, Gerd

WanMil wrote

...
...
> > > > > > * I don't understand why you need a merge() method. Could you explain > > > what you are doing in this method and why it's required? > > The get method() of the tree returns the data for the first area that > > contains the coord. > > This area should contain all tags of the area itself plus those from > > areas intersecting it. > > Maybe this is not correct? > > Sounds wrong. If Area a1 is intersected by a2 only in 10% of its area > then an element located in the 90% that does not intersect would be > tagged with a1 and a2? > The merge() first creates a new nodeElem with the intersection, it merges the tags into this and subtracts the intersection from the others, so only the 10% part gets more tags. Thinking again about this I might not need both subtract() calls since I move the intersection part before the others .

I doubt that merge improves the performance. It breaks two areas into three probably with intersecting bounding boxes. So I would expect that you don't save much contains checks. But don't mind about my doubts: Did you measure a noticeable performance difference?

WanMil _______________________________________________ mkgmap-dev mailing list mkgmap-dev@.org http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

-- View this message in context: http://gis.638310.n2.nabble.com/Patch-v3-LocationHook-with-new-Quadtree-tp71... Sent from the Mkgmap Development mailing list archive at Nabble.com. _______________________________________________ mkgmap-dev mailing list mkgmap-dev@.org http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

_______________________________________________ mkgmap-dev mailing list mkgmap-dev@.org http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

-- View this message in context: http://gis.638310.n2.nabble.com/Patch-v3-LocationHook-with-new-Quadtree-tp71... Sent from the Mkgmap Development mailing list archive at Nabble.com. _______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

_______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

Gerd Petermann

14 Jan 14 Jan

8:36 a.m.

Hi WanMil, regarding further optimization of the quadtree (presuming we can correct remaining errors): I think we have two options: 1) Create and calculate it at runtime (as it is now) . In this case the goal must be to keep the creation time and answering time short 2) Calculate it in BoundaryPreparer and save it in the .bnd files (with a new or extended format). In this case we can avoid higher calculation time if that means reducing redundancy. With precompiled quadtrees the LocationHook just has to create a small master-quadtree and load required parts into it. The quadtree in my patch probably contains a lot of redundant information because tiles with admin-level=2 or =4 will cover big parts of the tree, and the information is (almost always) already contained in the smaller areas because I use the information from the lies_in tags. I did not try it, but I expect that the merge() process result should be the same if we don't use the lies_in information. If we move the quadtree to the preparer, we can omit the calculation of the lies_in tags (I think they are not needed elsewhere), but we should try to avoid saving redundant information. If we can save the quadtree in a format that can be loaded faster than the time that is now needed to load the .bnd files plus calculating the quadtree, I'd vote for optimizing it for the preparer. Does that make sense? Ciao, Gerd

...

Date: Fri, 13 Jan 2012 22:19:08 +0100 From: wmgcnfg@web.de To: mkgmap-dev@lists.mkgmap.org.uk Subject: Re: [mkgmap-dev] [Patch v3] LocationHook with new Quadtree

Ok, I think I didn't understand your concept: The basis principle of the tree is that there are non intersecting areas only so that only one area can be found for one point. And each area contains all tags of all intersecting boundaries. I thought merging would be a performance thing only.

The advantage of this is that you only need one match to get all. If we implement cutting ways by boundaries there is the disadvantage that we don't know in the LocationHook which tags are relevant and therefore ways might be splitted too much.

WanMil

...
Hi WanMil,

of course it is faster, but how do you solve the problem that the information is incomplete? Maybe I did not make that clear: Without the merge step, you have to travel through the nodeElems until information from all areas in combined. If you don't do this, you have the same result as in trunk when you remove an element from the list after it was first processed.

Ciao, Gerd

WanMil wrote

...
Mmmh, I tested with an without merge using 66 tiles.

LocationHook time with merge: 72,7s LocationHook time without merge: 55,6s

The times are very different. So I assumed that the GC slowed down the merge variant so I compared the timings of each tile. One tile was 8s slower with the merge variant. So maybe the GC slowed it down.

But most of the tiles were quicker with the non merge variant. Some are quicker with merging and some are rather equal.

Maybe timings can be improved by changing the depth and the nodeNum constraint in the split method but merging does not seem to be an overall improvement.

WanMil

...
Hi WanMil,

yes, it improves performance because the merge is done only once, and without the merge it would be needed to do more or less the same action for each single search. I see no better way to handle the problem with those level=7 areas.

Ciao, Gerd

WanMil wrote

...
...
> > > > > > * I don't understand why you need a merge() method. Could you explain > > > what you are doing in this method and why it's required? > > The get method() of the tree returns the data for the first area that > > contains the coord. > > This area should contain all tags of the area itself plus those from > > areas intersecting it. > > Maybe this is not correct? > > Sounds wrong. If Area a1 is intersected by a2 only in 10% of its area > then an element located in the 90% that does not intersect would be > tagged with a1 and a2? > The merge() first creates a new nodeElem with the intersection, it merges the tags into this and subtracts the intersection from the others, so only the 10% part gets more tags. Thinking again about this I might not need both subtract() calls since I move the intersection part before the others .

I doubt that merge improves the performance. It breaks two areas into three probably with intersecting bounding boxes. So I would expect that you don't save much contains checks. But don't mind about my doubts: Did you measure a noticeable performance difference?

WanMil _______________________________________________ mkgmap-dev mailing list mkgmap-dev@.org http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

-- View this message in context: http://gis.638310.n2.nabble.com/Patch-v3-LocationHook-with-new-Quadtree-tp71... Sent from the Mkgmap Development mailing list archive at Nabble.com. _______________________________________________ mkgmap-dev mailing list mkgmap-dev@.org http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

_______________________________________________ mkgmap-dev mailing list mkgmap-dev@.org http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

-- View this message in context: http://gis.638310.n2.nabble.com/Patch-v3-LocationHook-with-new-Quadtree-tp71... Sent from the Mkgmap Development mailing list archive at Nabble.com. _______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

_______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

WanMil

15 Jan 15 Jan

9:10 a.m.

Hi Gerd,

...

Hi WanMil,

regarding further optimization of the quadtree (presuming we can correct remaining errors):

I think we have two options: 1) Create and calculate it at runtime (as it is now) . In this case the goal must be to keep the creation time and answering time short

From my point of view the timings of the LocationHook are great now. Improvements can be micro improvements only. Investing the same time to other parts of mkgmap would give more benefit.

...

2) Calculate it in BoundaryPreparer and save it in the .bnd files (with a new or extended format). In this case we can avoid higher calculation time if that means reducing redundancy. With precompiled quadtrees the LocationHook just has to create a small master-quadtree and load required parts into it.

The quadtree in my patch probably contains a lot of redundant information because tiles with admin-level=2 or =4 will cover big parts of the tree, and the information is (almost always) already contained in the smaller areas because I use the information from the lies_in tags.

Mmh, I though the merge step will create non overlapping areas and each area contains *all* of its information. So there should be no large admin_level=2 area left but lots small areas all containing the admin_level=2,3,4,5,6,7,8,9,10,11 information.

...

I did not try it, but I expect that the merge() process result should be the same if we don't use the lies_in information.

If not something is wrong?!

...

If we move the quadtree to the preparer, we can omit the calculation of the lies_in tags (I think they are not needed elsewhere), but we should try to avoid saving redundant information.

The lies_in information is only 80% of the information required for the new quadtree. So it might help but is not sufficient to remove the merge step.

...

If we can save the quadtree in a format that can be loaded faster than the time that is now needed to load the .bnd files plus calculating the quadtree, I'd vote for optimizing it for the preparer.

The preparer would have to do the merge step: From the given boundaries create not overlapping areas with all location information. I would not try to store additional quadtree information. You can try but I don't expect any measurable improvements by that. All in all: Please keep the format of the bnd files as simple as possible and think about using it for other purposes. e.g. the current format could easily be used for coastline processing - one can precompile the coastlines and mgkmap could load the sea areas from the precompiled sea file. This would be great to have (a big!! performance and quality improvement). But someone needs to implement that.

...

Does that make sense?

Yes. Also see my first point: Measure how much time it takes and decide yourself if you want to invest a lot of time for micro improvements. WanMil

...

Ciao, Gerd

...
Date: Fri, 13 Jan 2012 22:19:08 +0100 From: wmgcnfg@web.de To: mkgmap-dev@lists.mkgmap.org.uk Subject: Re: [mkgmap-dev] [Patch v3] LocationHook with new Quadtree

Ok, I think I didn't understand your concept: The basis principle of the tree is that there are non intersecting areas only so that only one area can be found for one point. And each area contains all tags of all intersecting boundaries. I thought merging would be a performance thing only.

The advantage of this is that you only need one match to get all. If we implement cutting ways by boundaries there is the disadvantage that we don't know in the LocationHook which tags are relevant and therefore ways might be splitted too much.

WanMil

...
Hi WanMil,

of course it is faster, but how do you solve the problem that the information is incomplete? Maybe I did not make that clear: Without the merge step, you have to travel through the nodeElems until information from all areas in combined. If you don't do this, you have the same result as in trunk when you remove an element from the list after it was first processed.

Ciao, Gerd

WanMil wrote

...
Mmmh, I tested with an without merge using 66 tiles.

LocationHook time with merge: 72,7s LocationHook time without merge: 55,6s

The times are very different. So I assumed that the GC slowed down the merge variant so I compared the timings of each tile. One tile was 8s slower with the merge variant. So maybe the GC slowed it down.

But most of the tiles were quicker with the non merge variant.

Some are

...
quicker with merging and some are rather equal.

Maybe timings can be improved by changing the depth and the nodeNum constraint in the split method but merging does not seem to be an overall improvement.

WanMil

...
Hi WanMil,

yes, it improves performance because the merge is done only once, and without the merge it would be needed to do more or less the same action for each single search. I see no better way to handle the problem with those level=7 areas.

Ciao, Gerd

WanMil wrote

...
> > > > > > > > * I don't understand why you need a merge() method. Could > you > explain > > > > what you are doing in this method and why it's required? > > > The get method() of the tree returns the data for the first > area > that > > > contains the coord. > > > This area should contain all tags of the area itself plus those > from > > > areas intersecting it. > > > Maybe this is not correct? > > > > Sounds wrong. If Area a1 is intersected by a2 only in 10% of its > area > > then an element located in the 90% that does not intersect would > be > > tagged with a1 and a2? > > > The merge() first creates a new nodeElem with the intersection, it > merges the tags into this > and subtracts the intersection from the others, so only the 10%

part

...
> gets more tags. > Thinking again about this I might not need both subtract() calls since > I > move the intersection part > before the others . >

I doubt that merge improves the performance. It breaks two areas into three probably with intersecting bounding boxes. So I would expect that you don't save much contains checks. But don't mind about my doubts: Did you measure a noticeable performance difference?

WanMil _______________________________________________ mkgmap-dev mailing list mkgmap-dev@.org http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

-- View this message in context:

http://gis.638310.n2.nabble.com/Patch-v3-LocationHook-with-new-Quadtree-tp71...

...
Sent from the Mkgmap Development mailing list archive at Nabble.com. _______________________________________________ mkgmap-dev mailing list mkgmap-dev@.org http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

_______________________________________________ mkgmap-dev mailing list mkgmap-dev@.org http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

-- View this message in context: http://gis.638310.n2.nabble.com/Patch-v3-LocationHook-with-new-Quadtree-tp71... Sent from the Mkgmap Development mailing list archive at Nabble.com. _______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

_______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

_______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

Gerd Petermann

16 Jan 16 Jan

7:45 a.m.

Hi WanMil,

...

Date: Sun, 15 Jan 2012 10:10:42 +0100 From: wmgcnfg@web.de To: mkgmap-dev@lists.mkgmap.org.uk Subject: Re: [mkgmap-dev] [Patch v3] LocationHook with new Quadtree

Hi Gerd,

...
Hi WanMil,

regarding further optimization of the quadtree (presuming we can correct remaining errors):

I think we have two options: 1) Create and calculate it at runtime (as it is now) . In this case the goal must be to keep the creation time and answering time short

From my point of view the timings of the LocationHook are great now. Improvements can be micro improvements only. Investing the same time to other parts of mkgmap would give more benefit.

Good to hear. I think the merge() still offers room for improvements, but first I have to find an error in it, see below.

...

...
2) Calculate it in BoundaryPreparer and save it in the .bnd files (with a new or extended format). In this case we can avoid higher calculation time if that means reducing redundancy. With precompiled quadtrees the LocationHook just has to create a small master-quadtree and load required parts into it.

The quadtree in my patch probably contains a lot of redundant information because tiles with admin-level=2 or =4 will cover big parts of the tree, and the information is (almost always) already contained in the smaller areas because I use the information from the lies_in tags.

Mmh, I though the merge step will create non overlapping areas and each area contains *all* of its information. So there should be no large admin_level=2 area left but lots small areas all containing the admin_level=2,3,4,5,6,7,8,9,10,11 information.

The current implementation uses the infos of the lies_in tag and stops merging when larger areas are not adding information. So, the larger areas are kept untouched (and typically unused). They are just eating up some kb heap memory.

...

...
I did not try it, but I expect that the merge() process result should be the same if we don't use the lies_in information.

If not something is wrong?!

I think so, and yes, the results are very different. So, I first have to analyse why this doesn't work as expected.

...

...
If we move the quadtree to the preparer, we can omit the calculation of the lies_in tags (I think they are not needed elsewhere), but we should try to avoid saving redundant information.

The lies_in information is only 80% of the information required for the new quadtree. So it might help but is not sufficient to remove the merge step.

Hmm, I don't fully understand what you mean. I thought about this: My idea is to replace the algorithm in BoundaryPreparer.workoutBoundaryRelations() by a call of BoundaryQuadTree and to save the quadtree.

...

...
If we can save the quadtree in a format that can be loaded faster than the time that is now needed to load the .bnd files plus calculating the quadtree, I'd vote for optimizing it for the preparer.

The preparer would have to do the merge step: From the given boundaries create not overlapping areas with all location information. I would not try to store additional quadtree information. You can try but I don't expect any measurable improvements by that.

Okay, that sound's like a good alternative with the advantage that we can keep the format of the bnd files.

...

All in all: Please keep the format of the bnd files as simple as possible and think about using it for other purposes. e.g. the current format could easily be used for coastline processing - one can precompile the coastlines and mgkmap could load the sea areas from the precompiled sea file. This would be great to have (a big!! performance and quality improvement). But someone needs to implement that.

I did not look at coastline processing until now, but if that requires optimization than I will do after fixing the remaining problems.

...

...
Does that make sense?

Yes. Also see my first point: Measure how much time it takes and decide yourself if you want to invest a lot of time for micro improvements.

Okay, I agree.

...

WanMil

...
Ciao, Gerd

...
Date: Fri, 13 Jan 2012 22:19:08 +0100 From: wmgcnfg@web.de To: mkgmap-dev@lists.mkgmap.org.uk Subject: Re: [mkgmap-dev] [Patch v3] LocationHook with new Quadtree

Ok, I think I didn't understand your concept: The basis principle of the tree is that there are non intersecting areas only so that only one area can be found for one point. And each area contains all tags of all intersecting boundaries. I thought merging would be a performance thing only.

The advantage of this is that you only need one match to get all. If we implement cutting ways by boundaries there is the disadvantage that we don't know in the LocationHook which tags are relevant and therefore ways might be splitted too much.

WanMil

...
Hi WanMil,

of course it is faster, but how do you solve the problem that the information is incomplete? Maybe I did not make that clear: Without the merge step, you have to travel through the nodeElems until information from all areas in combined. If you don't do this, you have the same result as in trunk when you remove an element from the list after it was first processed.

Ciao, Gerd

WanMil wrote

...
Mmmh, I tested with an without merge using 66 tiles.

LocationHook time with merge: 72,7s LocationHook time without merge: 55,6s

The times are very different. So I assumed that the GC slowed down the merge variant so I compared the timings of each tile. One tile was 8s slower with the merge variant. So maybe the GC slowed it down.

But most of the tiles were quicker with the non merge variant.

Some are

...
quicker with merging and some are rather equal.

Maybe timings can be improved by changing the depth and the nodeNum constraint in the split method but merging does not seem to be an overall improvement.

WanMil

...
Hi WanMil,

yes, it improves performance because the merge is done only once, and without the merge it would be needed to do more or less the same action for each single search. I see no better way to handle the problem with those level=7 areas.

Ciao, Gerd

WanMil wrote > >> > > > >> > > > * I don't understand why you need a merge() method. Could >> you >> explain >> > > > what you are doing in this method and why it's required? >> > > The get method() of the tree returns the data for the first >> area >> that >> > > contains the coord. >> > > This area should contain all tags of the area itself plus those >> from >> > > areas intersecting it. >> > > Maybe this is not correct? >> > >> > Sounds wrong. If Area a1 is intersected by a2 only in 10% of its >> area >> > then an element located in the 90% that does not intersect would >> be >> > tagged with a1 and a2? >> > >> The merge() first creates a new nodeElem with the intersection, it >> merges the tags into this >> and subtracts the intersection from the others, so only the 10% part >> gets more tags. >> Thinking again about this I might not need both subtract() calls since >> I >> move the intersection part >> before the others . >> > > I doubt that merge improves the performance. It breaks two areas into > three probably with intersecting bounding boxes. So I would expect that > you don't save much contains checks. > But don't mind about my doubts: Did you measure a noticeable performance > difference? > > WanMil > _______________________________________________ > mkgmap-dev mailing list > mkgmap-dev@.org > http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev >

-- View this message in context:

http://gis.638310.n2.nabble.com/Patch-v3-LocationHook-with-new-Quadtree-tp71...

...
Sent from the Mkgmap Development mailing list archive at Nabble.com. _______________________________________________ mkgmap-dev mailing list mkgmap-dev@.org http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

_______________________________________________ mkgmap-dev mailing list mkgmap-dev@.org http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

-- View this message in context: http://gis.638310.n2.nabble.com/Patch-v3-LocationHook-with-new-Quadtree-tp71... Sent from the Mkgmap Development mailing list archive at Nabble.com. _______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

_______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

_______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

_______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

WanMil

9:28 p.m.

...

...
...
2) Calculate it in BoundaryPreparer and save it in the .bnd files (with a new or extended format). In this case we can avoid higher calculation time if that means reducing redundancy. With precompiled quadtrees the LocationHook just has to create a small master-quadtree and load required parts into it.

The quadtree in my patch probably contains a lot of redundant information because tiles with admin-level=2 or =4 will cover big parts of the tree, and the information is (almost always) already contained in the smaller areas because I use the information from the lies_in tags.

Mmh, I though the merge step will create non overlapping areas and each area contains *all* of its information. So there should be no large admin_level=2 area left but lots small areas all containing the admin_level=2,3,4,5,6,7,8,9,10,11 information.

The current implementation uses the infos of the lies_in tag and stops merging when larger areas are not adding information. So, the larger areas are kept untouched (and typically unused). They are just eating up some kb heap memory.

Ok, I haven't gone into deep in the merge() method. But I think you are going the right way!

...

...
...
I did not try it, but I expect that the merge() process result should be the same if we don't use the lies_in information.

If not something is wrong?!

I think so, and yes, the results are very different. So, I first have to analyse why this doesn't work as expected.

...
...
If we move the quadtree to the preparer, we can omit the calculation of the lies_in tags (I think they are not needed elsewhere), but we should try to avoid saving redundant information.

The lies_in information is only 80% of the information required for the new quadtree. So it might help but is not sufficient to remove the merge step.

Hmm, I don't fully understand what you mean. I thought about this: My idea is to replace the algorithm in BoundaryPreparer.workoutBoundaryRelations() by a call of BoundaryQuadTree and to save the quadtree.

Maybe we want the same but we are talking in a different way about that. Let me try in other words: The preparer creates *non overlapping* areas containing all boundary information. So each point lies only in one (or none) precompiled area (ignoring the fact that there is a problem when a point lies exactly on the edge of an area). The area can be used to tag the point (which also mean a way using one point) with all available boundary information. In the end this would be the same result as your intended merge step (if we ignore the quadtree structure). This precompiled information is stored to the bnd files and loaded by mkgmaps LocationHook. The LocationHook creates the BoundaryQuadTree from this information and because of the precompilation no more merge step is required. Do you aggree?

...

...
...
If we can save the quadtree in a format that can be loaded faster than the time that is now needed to load the .bnd files plus calculating the quadtree, I'd vote for optimizing it for the preparer.

The preparer would have to do the merge step: From the given boundaries create not overlapping areas with all location information. I would not try to store additional quadtree information. You can try but I don't expect any measurable improvements by that.

Okay, that sound's like a good alternative with the advantage that we can keep the format of the bnd files.

...
All in all: Please keep the format of the bnd files as simple as possible and think about using it for other purposes. e.g. the current format could easily be used for coastline processing - one can precompile the coastlines and mgkmap could load the sea areas from the precompiled sea file. This would be great to have (a big!! performance and quality improvement). But someone needs to implement that.

I did not look at coastline processing until now, but if that requires optimization than I will do after fixing the remaining problems.

Yes, it's not a very big deal when you know the system of the precompiled bounds. But it would be a great improvement (also for people with fewer GBs of memory). WanMil

Gerd Petermann

9:54 p.m.

Hi WanMil,

...

Date: Mon, 16 Jan 2012 22:28:04 +0100 From: wmgcnfg@web.de To: mkgmap-dev@lists.mkgmap.org.uk Subject: Re: [mkgmap-dev] [Patch v3] LocationHook with new Quadtree

...
...
...
2) Calculate it in BoundaryPreparer and save it in the .bnd files (with a new or extended format). In this case we can avoid higher calculation time if that means reducing redundancy. With precompiled quadtrees the LocationHook just has to create a small master-quadtree and load required parts into it.

The quadtree in my patch probably contains a lot of redundant information because tiles with admin-level=2 or =4 will cover big parts of the tree, and the information is (almost always) already contained in the smaller areas because I use the information from the lies_in tags.

Mmh, I though the merge step will create non overlapping areas and each area contains *all* of its information. So there should be no large admin_level=2 area left but lots small areas all containing the admin_level=2,3,4,5,6,7,8,9,10,11 information.

The current implementation uses the infos of the lies_in tag and stops merging when larger areas are not adding information. So, the larger areas are kept untouched (and typically unused). They are just eating up some kb heap memory.

Ok, I haven't gone into deep in the merge() method. But I think you are going the right way!

Ok, fine.

...

...
...
...
I did not try it, but I expect that the merge() process result should be the same if we don't use the lies_in information.

If not something is wrong?!

I think so, and yes, the results are very different. So, I first have to analyse why this doesn't work as expected.

...
...
If we move the quadtree to the preparer, we can omit the calculation of the lies_in tags (I think they are not needed elsewhere), but we should try to avoid saving redundant information.

The lies_in information is only 80% of the information required for the new quadtree. So it might help but is not sufficient to remove the merge step.

Hmm, I don't fully understand what you mean. I thought about this: My idea is to replace the algorithm in BoundaryPreparer.workoutBoundaryRelations() by a call of BoundaryQuadTree and to save the quadtree.

Maybe we want the same but we are talking in a different way about that. Let me try in other words:

The preparer creates *non overlapping* areas containing all boundary information. So each point lies only in one (or none) precompiled area (ignoring the fact that there is a problem when a point lies exactly on the edge of an area). The area can be used to tag the point (which also mean a way using one point) with all available boundary information.

In the end this would be the same result as your intended merge step (if we ignore the quadtree structure). This precompiled information is stored to the bnd files and loaded by mkgmaps LocationHook. The LocationHook creates the BoundaryQuadTree from this information and because of the precompilation no more merge step is required.

Do you aggree?

Yes, that sounds good to me. If preparer makes sure that no areas do overlap, than we can omit the merge step. Ciao, Gerd

...

...
...
...
If we can save the quadtree in a format that can be loaded faster than the time that is now needed to load the .bnd files plus calculating the quadtree, I'd vote for optimizing it for the preparer.

The preparer would have to do the merge step: From the given boundaries create not overlapping areas with all location information. I would not try to store additional quadtree information. You can try but I don't expect any measurable improvements by that.

Okay, that sound's like a good alternative with the advantage that we can keep the format of the bnd files.

...
All in all: Please keep the format of the bnd files as simple as possible and think about using it for other purposes. e.g. the current format could easily be used for coastline processing - one can precompile the coastlines and mgkmap could load the sea areas from the precompiled sea file. This would be great to have (a big!! performance and quality improvement). But someone needs to implement that.

I did not look at coastline processing until now, but if that requires optimization than I will do after fixing the remaining problems.

Yes, it's not a very big deal when you know the system of the precompiled bounds. But it would be a great improvement (also for people with fewer GBs of memory).

WanMil _______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

WanMil

13 Jan 13 Jan

6:20 p.m.

Now some performance facts using my standard 15 tile map: r2168: LocationHook time: 44,8s patch: LocationHook time: 11,5s So the time for the LocationHook is reduced to 1/4. Great! But: The tiles differ very much between r2168 and the patched version. So there is some investigation work required to check if the patched version does what it should do. WanMil

...

Hi,

now complete and based on r2167.

Gerd

http://gis.638310.n2.nabble.com/file/n7181669/locationHook_speedup_v3.patch locationHook_speedup_v3.patch

-- View this message in context: http://gis.638310.n2.nabble.com/Patch-v3-LocationHook-with-new-Quadtree-tp71... Sent from the Mkgmap Development mailing list archive at Nabble.com. _______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

Gerd Petermann

6:31 p.m.

Hi WanMil, yes, I got similar timing values. The testing is quite difficult. The results are almost equal when I change both versions to search only for one point of a way, so I am positive that it works, but there are still differences. I can't prove it, but I assume that both versions are wrong, because both don't find info for some points which lie inside the quadtree. I assume those are points that lie exactly on a boundary AND on a quadtree bbox. I'd try the following: if a point is within the quadtree , but no info is found, the algorithm should try again with a point next to the original. Would that be an option? Ciao, Gerd

...

Date: Fri, 13 Jan 2012 19:20:16 +0100 From: wmgcnfg@web.de To: mkgmap-dev@lists.mkgmap.org.uk Subject: Re: [mkgmap-dev] [Patch v3] LocationHook with new Quadtree

Now some performance facts using my standard 15 tile map:

r2168: LocationHook time: 44,8s patch: LocationHook time: 11,5s

So the time for the LocationHook is reduced to 1/4. Great!

But: The tiles differ very much between r2168 and the patched version. So there is some investigation work required to check if the patched version does what it should do.

WanMil

...
Hi,

now complete and based on r2167.

Gerd

http://gis.638310.n2.nabble.com/file/n7181669/locationHook_speedup_v3.patch locationHook_speedup_v3.patch

-- View this message in context: http://gis.638310.n2.nabble.com/Patch-v3-LocationHook-with-new-Quadtree-tp71... Sent from the Mkgmap Development mailing list archive at Nabble.com. _______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

_______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

GerdP

6:57 p.m.

Hi WanMil, If I see this right, we would see only a few changes if we split the ways on boundaries first. If you have an idea how to do it, please let me know (if you don't want to do it). The remaining differences should be errors caused by the "insideness" problem of contains(). I will be offline most of the weekend. Ciao, Gerd WanMil wrote

...

The tiles differ very much between r2168 and the patched version. So there is some investigation work required to check if the patched version does what it should do.

-- View this message in context: http://gis.638310.n2.nabble.com/Patch-v3-LocationHook-with-new-Quadtree-tp71... Sent from the Mkgmap Development mailing list archive at Nabble.com.

WanMil

15 Jan 15 Jan

8:45 a.m.

...

Hi WanMil,

If I see this right, we would see only a few changes if we split the ways on boundaries first. If you have an idea how to do it, please let me know (if you don't want to do it).

There are some detail problems left: 1. Up to now there is no clipper that splits lines at polygon borders. (or I haven't found it in the mkgmap source code?) 2. In case we have a closed way it is not clear to the LocationHook if it is a line or a shape. This is decided by the styles. A workaround would be to cut this way twice - one as line and second as shape. The line variant would be applied to the line style whereas the shape variant is only applied to the polygons style. This is not 100% correct (the polygon style would not apply if the line style has a hit) but it will solve most cases. ?? By the way: I never checked if it makes sense to add location information to shapes. Are they used gy garmin? 3. I expect that most streets do not end exactly at a city border but lots will end some meters after it. Cutting them will increase the number of objects much and we will have a lot of short streets. So maybe an overlapping threshold is required for the decision not to split a line.

...

The remaining differences should be errors caused by the "insideness" problem of contains().

Problems with the bounding box of the quadtree should be solved by adding +1 to maxlat/maxlong of the java area object.

...

I will be offline most of the weekend.

Have fun! WanMil

...

Ciao, Gerd

WanMil wrote

...
The tiles differ very much between r2168 and the patched version. So there is some investigation work required to check if the patched version does what it should do.

-- View this message in context: http://gis.638310.n2.nabble.com/Patch-v3-LocationHook-with-new-Quadtree-tp71... Sent from the Mkgmap Development mailing list archive at Nabble.com. _______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

Gerd Petermann

16 Jan 16 Jan

7:26 a.m.

Hi WanMil,

...

Date: Sun, 15 Jan 2012 09:45:33 +0100 From: wmgcnfg@web.de To: mkgmap-dev@lists.mkgmap.org.uk Subject: Re: [mkgmap-dev] [Patch v3] LocationHook with new Quadtree

...
Hi WanMil,

If I see this right, we would see only a few changes if we split the ways on boundaries first. If you have an idea how to do it, please let me know (if you don't want to do it).

There are some detail problems left: 1. Up to now there is no clipper that splits lines at polygon borders. (or I haven't found it in the mkgmap source code?) 2. In case we have a closed way it is not clear to the LocationHook if it is a line or a shape. This is decided by the styles. A workaround would be to cut this way twice - one as line and second as shape. The line variant would be applied to the line style whereas the shape variant is only applied to the polygons style. This is not 100% correct (the polygon style would not apply if the line style has a hit) but it will solve most cases. ?? By the way: I never checked if it makes sense to add location information to shapes. Are they used gy garmin?

3. I expect that most streets do not end exactly at a city border but lots will end some meters after it. Cutting them will increase the number of objects much and we will have a lot of short streets. So maybe an overlapping threshold is required for the decision not to split a line.

Okay, since I am not yet familiar with the part of the program that uses the location info, I'll leave that for now.

...

...
The remaining differences should be errors caused by the "insideness" problem of contains().

Problems with the bounding box of the quadtree should be solved by adding +1 to maxlat/maxlong of the java area object.

Hmm, does that mean you want to shift the whole area or only selected points? Which ones? I thought about using the Area.transform() method to blow up the area a little bit, but did not yet find something useful. I think searching the shifted point is easy to implement and this double (or multiple) search will not happen very often. Ciao, Gerd

WanMil

9:16 p.m.

Hi Gerd,

...

...
3. I expect that most streets do not end exactly at a city border but lots will end some meters after it. Cutting them will increase the number of objects much and we will have a lot of short streets. So maybe an overlapping threshold is required for the decision not to split a line.

Okay, since I am not yet familiar with the part of the program that uses the location info, I'll leave that for now.

I don't want you hold off from changing that and playing around a bit. I only want to list up things that come into my mind that have to be solved so you can be aware of that :-)

...

...
...
The remaining differences should be errors caused by the "insideness" problem of contains().

Problems with the bounding box of the quadtree should be solved by adding +1 to maxlat/maxlong of the java area object.

Hmm, does that mean you want to shift the whole area or only selected points? Which ones? I thought about using the Area.transform() method to blow up the area a little bit, but did not yet find something useful. I think searching the shifted point is easy to implement and this double (or multiple) search will not happen very often.

I think it's much easier: In the constructor of BoundaryQuadTree.Node use the following line: this.bbox = new Rectangle(bbox.getMinLong(), bbox.getMinLat(), bbox.getMaxLong() - bbox.getMinLong() + 1, bbox.getMaxLat() - bbox.getMinLat() + 1); So just increase the width and height of the (java.awt.geom.Area) bounding box by 1. That should do it(?). WanMil

...

Ciao, Gerd

Gerd Petermann

9:35 p.m.

Hi WanMil,

...

Date: Mon, 16 Jan 2012 22:16:21 +0100 From: wmgcnfg@web.de To: mkgmap-dev@lists.mkgmap.org.uk Subject: Re: [mkgmap-dev] [Patch v3] LocationHook with new Quadtree

Hi Gerd,

...
...
3. I expect that most streets do not end exactly at a city border but lots will end some meters after it. Cutting them will increase the number of objects much and we will have a lot of short streets. So maybe an overlapping threshold is required for the decision not to split a line.

Okay, since I am not yet familiar with the part of the program that uses the location info, I'll leave that for now.

I don't want you hold off from changing that and playing around a bit. I only want to list up things that come into my mind that have to be solved so you can be aware of that :-)

Don't worry, I'll keep on searching for optimizations. When I say I leave it for now that means that I let my brain do its work without forcing it. Sometimes I wake up with an idea, and than I start coding.

...

...
...
...
The remaining differences should be errors caused by the "insideness" problem of contains().

Problems with the bounding box of the quadtree should be solved by adding +1 to maxlat/maxlong of the java area object.

Hmm, does that mean you want to shift the whole area or only selected points? Which ones? I thought about using the Area.transform() method to blow up the area a little bit, but did not yet find something useful. I think searching the shifted point is easy to implement and this double (or multiple) search will not happen very often.

I think it's much easier: In the constructor of BoundaryQuadTree.Node use the following line: this.bbox = new Rectangle(bbox.getMinLong(), bbox.getMinLat(), bbox.getMaxLong() - bbox.getMinLong() + 1, bbox.getMaxLat() - bbox.getMinLat() + 1);

So just increase the width and height of the (java.awt.geom.Area) bounding box by 1. That should do it(?). I don't think so. The area keeps it's own internal bbox in field cachedBounds, and that is tested first when contains() is called . Anyway, while searching for the discrepancies with and without using the lies_in info I found a problem with rounding errors that can result in endless loops. I've added a check for this, but it slows down processing a little bit. Also, some of the postal_code boundaries are intersecting (probably errors in OSM, but the algorithm went amok when it happened).

I have compared the remaining differences and found them only for ways that are crossing boundaries, and for those both results looked good to me. I'll post a new patch tomorrow. ciao, Gerd

...

WanMil

...
Ciao, Gerd

_______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

WanMil

9:40 p.m.

...

...
...
...
...
The remaining differences should be errors caused by the

"insideness"

...
...
...
problem of contains().

Problems with the bounding box of the quadtree should be solved by adding +1 to maxlat/maxlong of the java area object.

Hmm, does that mean you want to shift the whole area or only selected points? Which ones? I thought about using the Area.transform() method to blow up the area a little bit, but did not yet find something useful. I think searching the shifted point is easy to implement and this double (or multiple) search will not happen very often.

I think it's much easier: In the constructor of BoundaryQuadTree.Node use the following line: this.bbox = new Rectangle(bbox.getMinLong(), bbox.getMinLat(), bbox.getMaxLong() - bbox.getMinLong() + 1, bbox.getMaxLat() - bbox.getMinLat() + 1);

So just increase the width and height of the (java.awt.geom.Area) bounding box by 1. That should do it(?). I don't think so. The area keeps it's own internal bbox in field cachedBounds, and that is tested first when contains() is called .

I am only talking about the java.awt.geom.Area bbox in the Node class. I am not talking about the area objects of each boundary because you were initially talking about the bounding box of the quadtree (which is a rectangle). WanMil

Gerd Petermann

9:47 p.m.

Hi WanMil,

...

Date: Mon, 16 Jan 2012 22:40:58 +0100 From: wmgcnfg@web.de To: mkgmap-dev@lists.mkgmap.org.uk Subject: Re: [mkgmap-dev] [Patch v3] LocationHook with new Quadtree

...
...
...
...
...
The remaining differences should be errors caused by the

"insideness"

...
...
...
problem of contains().

Problems with the bounding box of the quadtree should be solved by adding +1 to maxlat/maxlong of the java area object.

Hmm, does that mean you want to shift the whole area or only selected points? Which ones? I thought about using the Area.transform() method to blow up the area a little bit, but did not yet find something useful. I think searching the shifted point is easy to implement and this double (or multiple) search will not happen very often.

I think it's much easier: In the constructor of BoundaryQuadTree.Node use the following line: this.bbox = new Rectangle(bbox.getMinLong(), bbox.getMinLat(), bbox.getMaxLong() - bbox.getMinLong() + 1, bbox.getMaxLat() - bbox.getMinLat() + 1);

So just increase the width and height of the (java.awt.geom.Area) bounding box by 1. That should do it(?). I don't think so. The area keeps it's own internal bbox in field cachedBounds, and that is tested first when contains() is called .

I am only talking about the java.awt.geom.Area bbox in the Node class. I am not talking about the area objects of each boundary because you were initially talking about the bounding box of the quadtree (which is a rectangle).

In fact, the problem occures also with the java.awt.geom.Area.contains(co.getLongitude(), co.getLatitude()) Gerd

...

WanMil _______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

4787

Age (days ago)

4791

Last active (days ago)

List overview

28 comments

3 participants

participants (3)

Gerd Petermann
GerdP
WanMil