
On Wed, 31 Mar 2010 22:12:12 +0000 (UTC), Chris Miller <chris_overseas@hotmail.com> writes:
Note that Java's String.intern() method can be pretty slow, so while you'll save a fair chunk of memory you'll potentially suffer a noticable performance hit too if you're calling it a lot. By adding a barrier-free caching layer in front of the String.intern() calls you can gain a reasonable performance boost in this situation. As an example of how this can be implemented, take a look at Lucene's SimpleStringInterner which does exactly this:
http://github.com/apache/lucene/blob/1c5c409241a2b8b9e64dc8c253791b497a66c36...
It's threadsafe in that it guarantees just enough visibility to never generate invalid results, yet also avoids any blocking. Might be worth benchmarking something like this against the normal String.intern() with mkgmap.
We don't have to get rid of every duplicate string; only most of them, so approximation techniques such as a per-thread or per-parser FuzzyIntern will work fine, and require no concurrent access. Now that I know that String. intern() uses weak references, it seems to be the most trivial way to reduce the memory usage of that tile of planet.osm by at least 3x. Scott