Is there an optimal quad tree out there?

Tl;dr: Testing quad tree performance with different setups confirms that there is no single winner–takes–all solution that outperforms all others. Parameterising leaf bucket size based on data size and making the decision dynamic appear to be valid optimization techniques.

So this was basically for fun. Quad tree is a data structure that is useful for maps, game worlds, and other continuous 2D things. It is a rooted tree and a special case of k-d tree, which is related to R-tree.

A quad tree like all logical trees consists of branch nodes and leaf nodes.

Each branch has four tabularly aligned children (hence the quad; see picture below). A leaf node contains some items. Implementations abound: 1, 2, 3…. This java-quadtree has two additional features.

  • a tree autoexpands both inwards and outwards;
  • a configurable bucket size, which you can manually or automatically adjust (helps fine-tune the performance of trees of different size).

In my quad tree, dynamic bucket size means that buckets grow as a function time, specifically as some power of tree size. When the exponent is to cubic root (^0.333333), for instance, buckets are smaller than with the square root (^0.5). From the autoexpansion follows that dynamic trees are more “fruitful” than static ones in the sense that their smaller leaves bear more items than root-attending leaves. A static tree with massive buckets, on the other hand, is relatively flat – but it might also be suboptimal in searches.

Quad tree structuring A visual representation of a quad tree

Quad tree nodes are called Quads in the code. The basic node structure is like this:

public class Quad {
    public Quad
            parent = null,
            UL = null, // upper left corner child ...
            UR = null,
            LL = null,
            LR = null
            ;
    public double
            x1,y1,
            x2,y2
            ;

Coordinates (x1, y1) and (x2, y2) create a 2D bounding box within which either child quads (UL, UR, LL, LR), or a number of items are stored. When an item is placed in a leaf, that leaf is checked for overflow and possibly replaced by four child quads:

if (items.size() == LEAF_MAX_OBJECTS)
{
    expand();
}
if (UL != null)
{
    return place_(h, this);
}
else
{
    h.quad = this;
    items.add(h);
    return h;
}

A parent reference, which comes extra to a minimal implementation, enables intra-tree replace operations and growing the tree outwards.

In this implementation each item sits in a wrapper object. This creates some memory overhead, but the convenience is worth it, because an item and its coordinates now keep company. It also enables moving items in the tree with ease. The idea is simple: go upwards in the hierarchy until you find a node that contains the replaced item, then narrow down to a new leaf.

Quad tree structuring Moving an item in the quad tree

There is, of course, no single tree structure that would always outperform all others. Generally speaking search operations should be as fast as possible, and the number and distribution of items stored in a tree have effect on the outcome. I plotted some histograms to show how different settings contribute to the growth of a tree in this implementation (leaf depth on x axis; item count on y axis).

quadtree_1e6_hist.png quadtree_1e5_hist.png quadtree_1e4_hist.png quadtree_1e3_hist.png

The deeper the tree, the slower the searches will be, but the same is also true of a big bucket size. So the performance boils down to how much data you have to crunch. It will depend on application how big buckets are tolerable, but a dynamic bucket size helps to make the tree grow evenly on all fronts, reducing cluttering as compared to a big static bucket size.

I did a little performance test with JMH. Java performance varies due to garbage collection, JIT optimization, and possible other dynamic assets. So it is a good idea to use an established testing tool. There are also extrinsic factors, like my machine could run faster one day and slower another or slow down during use, so these statistics don’t measure up to scientific scrutiny here. I feel reasonably confident with them though.

Bucket         setup Data distribution Tree size (items) Bucket   size Insert (M-item/sec) Replace † (M-item/sec) findAll() 100th †† (K-srch/s) findAll() 16th †† (K-srch/s) findAll() 4th †† (K-srch/s)
Dyn. ^1/2 Uniform 1e6 7–1000 5.0 22.7 5.1 1.2 0.3
Static 100 Uniform 1e6 100 3.7 22.5 5.0 1.0 0.3
Static 10 Uniform 1e6 10 2.3 22.8 3.2 0.6 0.2
Dyn. ^3/4 Uniform 1e6 7–31622 8.0 22.7 1.6 0.7 0.3
Dyn. ^1/3 Uniform 1e6 7–100 4.8 22.0 1.8 0.6 0.3
Static 100 Uniform 1e5 100 6.0 22.6 46.9 11.8 3.6
Dyn. ^1/2 Uniform 1e5 7–316 6.8 22.4 44.3 11.6 3.8
Static 10 Uniform 1e5 10 3.9 22.7 43.0 9.7 3.1
Dyn. ^1/3 Uniform 1e5 7–46 6.6 22.5 25.0 7.0 3.1
Dyn. ^3/4 Uniform 1e5 7–5623 9.5 22.5 13.5 5.7 2.7
Static 100 Uniform 1e4 100 6.5 22.8 371.0 101.0 35.0
Static 10 Uniform 1e4 10 8.5 22.6 331.5 106.7 40.3
Dyn. ^1/2 Uniform 1e4 7–100 8.2 22.5 326.9 106.7 40.5
Dyn. ^1/3 Uniform 1e4 7–21 8.0 22.7 224.5 90.3 32.7
Dyn. ^3/4 Uniform 1e4 7–1000 10.6 22.8 124.2 60.1 29.4
Dyn. ^1/2 U, shifting frame ††† 1e6 7–1000 5.7 22.6 5.2 1.2 0.35
Static 100 U, shifting frame 1e6 100 3.7 22.7 5.0 1.0 0.3
Static 100 U, shifting frame 1e5 100 5.9 22.5 48.0 11.7 3.9
Dyn. ^1/2 U, shifting frame 1e5 7–316 7.2 22.8 44.2 11.6 4.0
Static 100 U, shifting frame 1e4 100 8.4 22.4 342.6 104.8 40.7
Dyn. ^1/2 U, shifting frame 1e4 7–100 8.8 22.6 326.4 108.3 41.3

(†) Replace here simulates a game world transition (at maximum 100th of world size at time). (††) Get all items from (x)th part of total area. (†††) A uniform distribution with a shifting frame that moves 100% along x-axis.

##Conclusion

Quad tree is a fun tree structure that is relatively easy to implement. Tree size and bucket size can affect tree performance, so optimizing the latter might pay off in critical contexts.

With a relatively large tree (1000.000 items) in typical searches, I found the best tested parameters over three times more performant than the worst. Insert and replace times varied too, but not as significantly.

Tree performance can be summarized by the following negative correlations.

  • faster inserts correlate with slower searches;
  • faster searches on restricted subareas slightly correlate with slower searches on larger subareas.

If your domain size is predictable, I recommend doing a binary search over static bucket sizes to determine an optimized solution (these probably follow a polynomial curve which could be induced from measurements, or deduced from tree properties). For a quick general solution, making bucket size dynamic with an exponent of 0.5 (square root of tree size) seems a good go.

##Benchmark details

Static 10
---------
Benchmark                             Mode  Cnt       Score      Error  Units
MyBenchmark.testStatic1000           thrpt   89       2.251 ±    0.097  ops/s
MyBenchmark.testStatic1000Gets100th  thrpt  200    3215.602 ±   29.730  ops/s
MyBenchmark.testStatic1000Gets16th   thrpt  200     610.941 ±    5.751  ops/s
MyBenchmark.testStatic1000Gets4th    thrpt  200     178.086 ±    1.830  ops/s
MyBenchmark.testStatic1000Replace    thrpt  200      22.774 ±    0.035  ops/s

MyBenchmark.testStatic100            thrpt  200      39.301 ±    1.126  ops/s
MyBenchmark.testStatic100Gets100th   thrpt  200   42985.199 ±  439.966  ops/s
MyBenchmark.testStatic100Gets16th    thrpt  200    9658.681 ±   91.600  ops/s
MyBenchmark.testStatic100Gets4th     thrpt  200    3087.886 ±   33.221  ops/s
MyBenchmark.testStatic100Replace     thrpt  200      22.748 ±    0.029  ops/s

MyBenchmark.testStatic10             thrpt  200     651.525 ±    6.232  ops/s
MyBenchmark.testStatic10Gets100th    thrpt  200  370994.830 ± 1628.220  ops/s
MyBenchmark.testStatic10Gets16th     thrpt  200  101044.828 ±  756.473  ops/s
MyBenchmark.testStatic10Gets4th      thrpt  200   34990.222 ±  258.951  ops/s
MyBenchmark.testStatic10Replace      thrpt  200      22.753 ±    0.026  ops/s


Static 100
----------
Benchmark                             Mode  Cnt       Score      Error  Units
MyBenchmark.testStatic1000           thrpt  200       3.674 ±    0.073  ops/s
MyBenchmark.testStatic1000Replace    thrpt  200      22.538 ±    0.069  ops/s
MyBenchmark.testStatic1000Gets100th  thrpt  200    5004.158 ±   66.833  ops/s
MyBenchmark.testStatic1000Gets16th   thrpt  200    1002.217 ±    8.836  ops/s
MyBenchmark.testStatic1000Gets4th    thrpt  200     289.583 ±    3.994  ops/s

MyBenchmark.testStatic100            thrpt  200      59.924 ±    0.296  ops/s
MyBenchmark.testStatic100Replace     thrpt  200      22.634 ±    0.075  ops/s
MyBenchmark.testStatic100Gets100th   thrpt  200   46923.773 ± 1634.403  ops/s
MyBenchmark.testStatic100Gets16th    thrpt  200   11842.317 ±  170.995  ops/s
MyBenchmark.testStatic100Gets4th     thrpt  200    3626.730 ±  151.959  ops/s

MyBenchmark.testStatic10             thrpt  200     852.848 ±    3.855  ops/s
MyBenchmark.testStatic10Replace      thrpt  200      22.551 ±    0.096  ops/s
MyBenchmark.testStatic10Gets100th    thrpt  200  331462.994 ± 6594.212  ops/s
MyBenchmark.testStatic10Gets16th     thrpt  200  106654.868 ±  946.438  ops/s
MyBenchmark.testStatic10Gets4th      thrpt  200   40281.885 ±  497.629  ops/s


Dynamic (TSIZE ^1/3)
-----------
MyBenchmark.testDynamic1000          thrpt  200       4.788 ±     0.111  ops/s
MyBenchmark.testDynamic1000Replace   thrpt  200      22.018 ±     0.473  ops/s
MyBenchmark.testDynamic1000Gets100th thrpt  200    1773.459 ±   124.443  ops/s
MyBenchmark.testDynamic1000Gets16th  thrpt  200     641.420 ±    24.462  ops/s
MyBenchmark.testDynamic1000Gets4th   thrpt  200     251.728 ±     4.703  ops/s

MyBenchmark.testDynamic100           thrpt  200      65.980 ±     0.591  ops/s
MyBenchmark.testDynamic100Replace    thrpt  200      22.450 ±     0.075  ops/s
MyBenchmark.testDynamic100Gets100th  thrpt  200   25044.196 ±  2141.868  ops/s
MyBenchmark.testDynamic100Gets16th   thrpt  200    7152.725 ±   418.036  ops/s
MyBenchmark.testDynamic100Gets4th    thrpt  200    3052.622 ±    72.834  ops/s

//Missing data
MyBenchmark.testDynamic10Replace     thrpt  200      22.668 ±     0.053  ops/s
MyBenchmark.testDynamic10Gets100th   thrpt  200  224504.656 ± 14537.053  ops/s
MyBenchmark.testDynamic10Gets16th    thrpt  200   90306.673 ±  2800.908  ops/s
MyBenchmark.testDynamic10Gets4th     thrpt  200   32746.688 ±   444.270  ops/s


Dynamic (TSIZE ^1/2)
-----------
Benchmark                            Mode  Cnt       Score      Error  Units
MyBenchmark.testDynamic1000         thrpt  200       4.962 ±    0.104  ops/s
MyBenchmark.testDynamic1000Replace  thrpt  200      22.688 ±    0.050  ops/s
MyBenchmark.testDynamic1000Gets100ththrpt  200    5103.998 ±   34.889  ops/s
MyBenchmark.testDynamic1000Gets16th thrpt  200    1165.960 ±    9.314  ops/s
MyBenchmark.testDynamic1000Gets4th  thrpt  200     347.509 ±    3.607  ops/s

MyBenchmark.testDynamic100          thrpt  200      67.917 ±    0.385  ops/s
MyBenchmark.testDynamic100Replace   thrpt  200      22.434 ±    0.157  ops/s
MyBenchmark.testDynamic100Gets100th thrpt  200   44350.620 ±  760.833  ops/s
MyBenchmark.testDynamic100Gets16th  thrpt  200   11577.383 ±   90.534  ops/s
MyBenchmark.testDynamic100Gets4th   thrpt  200    3843.733 ±   42.209  ops/s

//missing data
MyBenchmark.testDynamic10Replace    thrpt  200      22.549 ±    0.055  ops/s
MyBenchmark.testDynamic10Gets100th  thrpt  200  326871.228 ± 5853.849  ops/s
MyBenchmark.testDynamic10Gets16th   thrpt  200  106694.737 ± 1094.796  ops/s
MyBenchmark.testDynamic10Gets4th    thrpt  200   40501.948 ±  529.682  ops/s


Dynamic 0.75
------------
Benchmark                                  Mode  Cnt       Score      Error  Units
MyBenchmark.testDynamic1000_75_           thrpt  200       7.970 ±    0.131  ops/s
MyBenchmark.testDynamic1000_75_Replace    thrpt  200      22.678 ±    0.054  ops/s
MyBenchmark.testDynamic1000_75_Gets100th  thrpt  200    1561.178 ±   62.218  ops/s
MyBenchmark.testDynamic1000_75_Gets16th   thrpt  200     651.231 ±   14.533  ops/s
MyBenchmark.testDynamic1000_75_Gets4th    thrpt  200     270.326 ±    3.476  ops/s
MyBenchmark.testDynamic100_75_            thrpt  200      95.468 ±    0.571  ops/s
MyBenchmark.testDynamic100_75_Replace     thrpt  200      22.471 ±    0.133  ops/s
MyBenchmark.testDynamic100_75_Gets100th   thrpt  200   13502.331 ±  523.206  ops/s
MyBenchmark.testDynamic100_75_Gets16th    thrpt  200    5680.445 ±  128.139  ops/s
MyBenchmark.testDynamic100_75_Gets4th     thrpt  200    2692.851 ±   40.381  ops/s
MyBenchmark.testDynamic10_75_             thrpt  200    1057.870 ±    3.713  ops/s
MyBenchmark.testDynamic10_75_Replace      thrpt  200      22.761 ±    0.030  ops/s
MyBenchmark.testDynamic10_75_Gets100th    thrpt  200  124274.483 ± 3685.147  ops/s
MyBenchmark.testDynamic10_75_Gets16th     thrpt  200   60084.731 ±  803.888  ops/s
MyBenchmark.testDynamic10_75_Gets4th      thrpt  200   29446.770 ±  294.976  ops/s


Static 100, uniform distribution with a shifting frame
------------
Benchmark                                      Mode  Cnt     Score    Error  Units
MyBenchmark.testWindStatic1000_100_           thrpt  200     3.681 ±  0.062  ops/s
MyBenchmark.testWindStatic1000_100_Replace    thrpt  200    22.679 ±  0.036  ops/s
MyBenchmark.testWindStatic1000_100_Gets100th  thrpt  200  4971.401 ± 96.417  ops/s
MyBenchmark.testWindStatic1000_100_Gets16th   thrpt  200  1040.246 ±  7.080  ops/s
MyBenchmark.testWindStatic1000_100_Gets4th    thrpt  200   300.385 ±  2.343  ops/s

Benchmark                                     Mode  Cnt       Score      Error  Units
MyBenchmark.testWindStatic100_100_           thrpt  180      58.603 ±    0.351  ops/s
MyBenchmark.testWindStatic100_100_Replace    thrpt  200      22.544 ±    0.071  ops/s
MyBenchmark.testWindStatic100_100_Gets100th  thrpt  200   48023.528 ± 1055.287  ops/s
MyBenchmark.testWindStatic100_100_Gets16th   thrpt  200   11705.878 ±  196.446  ops/s
MyBenchmark.testWindStatic100_100_Gets4th    thrpt  200    3920.593 ±   62.509  ops/s

MyBenchmark.testWindStatic10_100_            thrpt  158     835.297 ±    3.547  ops/s
MyBenchmark.testWindStatic10_100_Replace     thrpt  200      22.427 ±    0.117  ops/s
MyBenchmark.testWindStatic10_100_Gets100th   thrpt  200  342671.618 ± 3990.980  ops/s
MyBenchmark.testWindStatic10_100_Gets16th    thrpt  200  104766.418 ±  715.201  ops/s
MyBenchmark.testWindStatic10_100_Gets4th     thrpt  200   40702.244 ±  173.719  ops/s

Dynamic ^0.5, U with shifting frame
------------

Benchmark                                  Mode  Cnt       Score      Error  Units
MyBenchmark.testWindDyn1000_05_           thrpt  200       5.727 ±    0.094  ops/s
MyBenchmark.testWindDyn1000_05_Replace    thrpt  200      22.574 ±    0.132  ops/s
MyBenchmark.testWindDyn1000_05_Gets100th  thrpt  200    5219.104 ±   17.831  ops/s
MyBenchmark.testWindDyn1000_05_Gets16th   thrpt  200    1176.705 ±    3.574  ops/s
MyBenchmark.testWindDyn1000_05_Gets4th    thrpt  200     350.920 ±    3.357  ops/s

MyBenchmark.testWindDyn100_05_            thrpt  180      72.104 ±    0.467  ops/s
MyBenchmark.testWindDyn100_05_Replace     thrpt  200      22.774 ±    0.022  ops/s
MyBenchmark.testWindDyn100_05_Gets100th   thrpt  200   44163.428 ±  746.761  ops/s
MyBenchmark.testWindDyn100_05_Gets16th    thrpt  200   11647.066 ±  326.001  ops/s
MyBenchmark.testWindDyn100_05_Gets4th     thrpt  200    4035.522 ±   22.628  ops/s

MyBenchmark.testWindDyn10_05_             thrpt  145     879.953 ±    3.049  ops/s
MyBenchmark.testWindDyn10_05_Replace      thrpt  200      22.587 ±    0.211  ops/s
MyBenchmark.testWindDyn10_05_Gets100th    thrpt  200  326384.871 ± 5617.785  ops/s
MyBenchmark.testWindDyn10_05_Gets16th     thrpt  200  108267.396 ±  919.802  ops/s
MyBenchmark.testWindDyn10_05_Gets4th      thrpt  200   41262.699 ±  243.515  ops/s