[PATCH v1] Reduce memory footprint of tags

The discussions around "Commit: r1566: Drop all tags from the osm file that are not used" inspired me to address the memory footprints of the tags. The patch reduces the memory footprint by using the String.intern() method. This ensures that the String "highway" exists only once when stored as tag. Additionally the values of the most common tags with a limited number of different values are also interned. (It makes sense to intern the value of a highway tag but it does not make sense to intern the value of a name tag because the share rate of the name tag values is too low). This all might sound embarrasing but the String objects that are returned from the XML reader are not interned. You can test this easily by exchanging the put method in class tag: public String put(String key, String value) { if ("highway".equals(key) && "highway" != key) { log.error("Tag is not interned"); } ensureSpace(); Integer ind = keyPos(key); if (ind == null) assert false : "keyPos(" + key + ") returns null - size = " + size + ", capacity = " + capacity; keys[ind] = key; String old = values[ind]; if (old == null) size++; values[ind] = value; return old; } You will see lots of "Tag is not interned" errors. I have seen memory reductions of > 10%. Please test this patch against your well known tiles. It would be perfect if someone has a tile and knows its minimum memory requirement for mkgmap. This patch should lower it. WanMil

Hi
method. This ensures that the String "highway" exists only once when stored as tag. Additionally the values of the most common tags with a limited number of different values are also interned. (It makes sense to intern the value of a highway tag but it does not make sense to intern the value of a name tag because the share rate of the name tag values is too low).
This all might sound embarrasing but the String objects that are returned from the XML reader are not interned. You can test this easily by exchanging the put method in class tag:
I have seen memory reductions of > 10%.
This is all very true. In fact the patch on the style-speed branch sneakily does an equivalent thing to itern'ing the key values, which does indeed save a lot of memory. So we do not need to add intern() inside Tags.put The intern on the values has less effect but might be still worth doing. ..Steve

Interning all the key names could also be useful for speed by allowing a '==' comparison instead of calling String.equals(). Though I suspect the latter tests object identity as its first move, so perhaps your patch improves this enough as a side-effect (and it's safer to not have to think which strings need .equals()!)
participants (3)
-
Steve Ratcliffe
-
Toby Speight
-
WanMil