
I noticed that mkgmap does not intern any strings. In particular, this tile, generated by the splitter, fails to build with -Xmx3000m on 64-bit jdk under linux. With my patch, mkgmap generates the tile with -Xmx1000m.
<bounds minlat='55.1953125' minlon='9.4921875' maxlat='56.6015625' maxlon='11.513671875'/>
This tile has 1m nodes. Among the nodes and ways on this tile, there are 12m tags, yet only 100k distinct tag key/value pairs; on average each value occurs 120 times.
I explicitly do not use normal string interning because String.intern() strings are kept forever, and I want these strings to be GC'able after the tile is done. I trade GCability for having the occasional string duplicated in memory by flushing the interning table every 10k unique strings.
This code is not presently multithread safe; Ideally there should be one string interning table for each parser/thread.
Scott
Hi Scott! I think that's a good idea to intern the strings. As far as I know the LossyIntern class is not needed. The .intern() function of a string does exactly the same. Some time ago I sent a very similar patch to the mailing list which is not yet committed. Could you please test with your use case if it performs a similar memory reduction? The patch is thread safe and does not intern all strings. In my opinion the value of a name tag should not be interned because there is a high probability that this tag is used once only. WanMil