Splitting the entire planet in one pass.

16 Sep 2010

      Well, technically two passes; the first pass is needed to figure out
where to split.

A new set of patches and line of development in my tree.

Features:
   About 1/3 of the RAM usage. An entire planet can be split into 1200
areas in one pass with -Xmx5000m --max-nodes=1000000 in one hour.
   No upper bound on how many areas a node/way/relation can occur in.
I've tested large overlaps with nodes in >10 regions.
   Generate up to 6700 areas in a single pass.  (Yes, -Xmx5000m
--max-nodes=50000 --max-regions=6700 on a 8gb machine generates 20,000
regions in three passes in 4 hours!)
   If areas.list says so, areas can overlap. The underlying algorithm
can also handle non-rectangular areas.
   Quite a bit faster
   RAM usage proportional to the total number of nodes in the output
files, at about 4 bytes per output node.

Issues:
  1. Two passes may be needed for a whole planet with 4gb of physical RAM.

  2. Scaling to thousands of areas/output files in one pass can cause problems.
      a. There may be a system limit on number of open files.
      b. Per-file memory overhead of Java's IO layer and
GzipOutputStream can cause unexpectedly high memory usage. On my
machine, this limits me from doing >=10,000 areas per pass.
      c. The current threading code behaves badly, causing >300k
context switches/second. My previous threading code also behaves
badly, requiring one thread per output stream. I have new threading
code that overcomes both problems and is 15% faster.

  3.  Dependencies on fastutils and dsiutils jars.

  4. Depends on the refactoring I did to support the binary format.

  5. Splitter does not find good splits with --max-nodes =< 50000 in
the default resolution.

Changes: The main memory overhead in the splitter is to track which
nodes are in which area. It formerly used a custom IntIntMap. I
refactored it to use an Int2ShortMultiMap, then implemented that
MutiMap by composing two underlying Int2ShortMap implementations with
different space efficiency tradeoffs, a custom sparsearray
implementation based on
http://google-sparsehash.googlecode.com/svn/trunk/doc/implementation.html,
and one imported from a library of java collections specialized to
primitive types. The hybrid has a memory overhead of about 4 bytes per
output, storing 750m keys with 800m vals in 3.2gb of heap.

Testing: I split the cloudmade extract of NY into 160 areas of
--max-nodes=100000 and compared my integration branch 'dev' versus SVN
trunk. I then compared 3 of the area files. Files have minor
differences.  My branch reorders the tags within each node compared to
trunk. Also, my XML output routines *always* output 7 degrees of
precision, even if the data has only 6 degrees of precision (i.e.
'123.1234560' instead of '123.123456').

Other notes: I just found out about a curious option
-XX:+UseCompressedOops for recent Java VM's. It lets 32-bit pointers
be used in a 64-bit VM, which can, for pointer-heavy uses, cut down on
memory usage, cache pressure, memory bandwidth, etc. My
SparseInt2ShortMap is fairly heavy on pointers and that option appears
to offer a 10% memory reduction.

Code is in branch 'perf/multiregion'.

Code integrating all of my patches is currently available in branch
'dev' in my git repository on github and will be in SVN within a few
weeks.

Scott

Scott Crosby

Chris Miller

Scott Crosby

Jeffrey C. Ollie

Scott Crosby

Jeffrey C. Ollie

Chris Miller

tags

participants (3)