Tl;dr: Testing quad tree performance with different setups confirms that there is no single winner–takes–all solution that outperforms all others. Parameterising leaf bucket size based on data size and making the decision dynamic appear to be valid optimization techniques.
So this was basically for fun. Quad tree is a data structure that is useful for maps, game worlds, and other continuous 2D things. It is a rooted tree and a special case of k-d tree, which is related to R-tree.
A quad tree like all logical trees consists of branch nodes and leaf nodes.
Each branch has four tabularly aligned children (hence the quad; see picture below). A leaf node contains some items. Implementations abound: 1, 2, 3…. This java-quadtree has two additional features.
- a tree autoexpands both inwards and outwards;
- a configurable bucket size, which you can manually or automatically adjust (helps fine-tune the performance of trees of different size).
In my quad tree, dynamic bucket size means that buckets grow as a function time, specifically as some power of tree size. When the exponent is to cubic root (^0.333333), for instance, buckets are smaller than with the square root (^0.5). From the autoexpansion follows that dynamic trees are more “fruitful” than static ones in the sense that their smaller leaves bear more items than root-attending leaves. A static tree with massive buckets, on the other hand, is relatively flat – but it might also be suboptimal in searches.
A visual representation of a quad tree
Quad tree nodes are called Quads in the code. The basic node structure is like this:
public class Quad {
public Quad
parent = null,
UL = null, // upper left corner child ...
UR = null,
LL = null,
LR = null
;
public double
x1,y1,
x2,y2
;
Coordinates (x1
, y1
) and (x2
, y2
) create a 2D bounding box within which either child quads (UL, UR, LL, LR), or a number of items are stored. When an item is placed in a leaf, that leaf is checked for overflow and possibly replaced by four child quads:
if (items.size() == LEAF_MAX_OBJECTS)
{
expand();
}
if (UL != null)
{
return place_(h, this);
}
else
{
h.quad = this;
items.add(h);
return h;
}
A parent
reference, which comes extra to a minimal implementation, enables intra-tree replace operations and growing the tree outwards.
In this implementation each item sits in a wrapper object. This creates some memory overhead, but the convenience is worth it, because an item and its coordinates now keep company. It also enables moving items in the tree with ease. The idea is simple: go upwards in the hierarchy until you find a node that contains the replaced item, then narrow down to a new leaf.
Moving an item in the quad tree
There is, of course, no single tree structure that would always outperform all others. Generally speaking search operations should be as fast as possible, and the number and distribution of items stored in a tree have effect on the outcome. I plotted some histograms to show how different settings contribute to the growth of a tree in this implementation (leaf depth on x axis; item count on y axis).
The deeper the tree, the slower the searches will be, but the same is also true of a big bucket size. So the performance boils down to how much data you have to crunch. It will depend on application how big buckets are tolerable, but a dynamic bucket size helps to make the tree grow evenly on all fronts, reducing cluttering as compared to a big static bucket size.
I did a little performance test with JMH. Java performance varies due to garbage collection, JIT optimization, and possible other dynamic assets. So it is a good idea to use an established testing tool. There are also extrinsic factors, like my machine could run faster one day and slower another or slow down during use, so these statistics don’t measure up to scientific scrutiny here. I feel reasonably confident with them though.
Bucket setup | Data distribution | Tree size (items) | Bucket size | Insert (M-item/sec) | Replace † (M-item/sec) | findAll() 100th †† (K-srch/s) | findAll() 16th †† (K-srch/s) | findAll() 4th †† (K-srch/s) |
Dyn. ^1/2 | Uniform | 1e6 | 7–1000 | 5.0 | 22.7 | 5.1 | 1.2 | 0.3 |
Static 100 | Uniform | 1e6 | 100 | 3.7 | 22.5 | 5.0 | 1.0 | 0.3 |
Static 10 | Uniform | 1e6 | 10 | 2.3 | 22.8 | 3.2 | 0.6 | 0.2 |
Dyn. ^3/4 | Uniform | 1e6 | 7–31622 | 8.0 | 22.7 | 1.6 | 0.7 | 0.3 |
Dyn. ^1/3 | Uniform | 1e6 | 7–100 | 4.8 | 22.0 | 1.8 | 0.6 | 0.3 |
Static 100 | Uniform | 1e5 | 100 | 6.0 | 22.6 | 46.9 | 11.8 | 3.6 |
Dyn. ^1/2 | Uniform | 1e5 | 7–316 | 6.8 | 22.4 | 44.3 | 11.6 | 3.8 |
Static 10 | Uniform | 1e5 | 10 | 3.9 | 22.7 | 43.0 | 9.7 | 3.1 |
Dyn. ^1/3 | Uniform | 1e5 | 7–46 | 6.6 | 22.5 | 25.0 | 7.0 | 3.1 |
Dyn. ^3/4 | Uniform | 1e5 | 7–5623 | 9.5 | 22.5 | 13.5 | 5.7 | 2.7 |
Static 100 | Uniform | 1e4 | 100 | 6.5 | 22.8 | 371.0 | 101.0 | 35.0 |
Static 10 | Uniform | 1e4 | 10 | 8.5 | 22.6 | 331.5 | 106.7 | 40.3 |
Dyn. ^1/2 | Uniform | 1e4 | 7–100 | 8.2 | 22.5 | 326.9 | 106.7 | 40.5 |
Dyn. ^1/3 | Uniform | 1e4 | 7–21 | 8.0 | 22.7 | 224.5 | 90.3 | 32.7 |
Dyn. ^3/4 | Uniform | 1e4 | 7–1000 | 10.6 | 22.8 | 124.2 | 60.1 | 29.4 |
Dyn. ^1/2 | U, shifting frame ††† | 1e6 | 7–1000 | 5.7 | 22.6 | 5.2 | 1.2 | 0.35 |
Static 100 | U, shifting frame | 1e6 | 100 | 3.7 | 22.7 | 5.0 | 1.0 | 0.3 |
Static 100 | U, shifting frame | 1e5 | 100 | 5.9 | 22.5 | 48.0 | 11.7 | 3.9 |
Dyn. ^1/2 | U, shifting frame | 1e5 | 7–316 | 7.2 | 22.8 | 44.2 | 11.6 | 4.0 |
Static 100 | U, shifting frame | 1e4 | 100 | 8.4 | 22.4 | 342.6 | 104.8 | 40.7 |
Dyn. ^1/2 | U, shifting frame | 1e4 | 7–100 | 8.8 | 22.6 | 326.4 | 108.3 | 41.3 |
(†) Replace here simulates a game world transition (at maximum 100th of world size at time). (††) Get all items from (x)th part of total area. (†††) A uniform distribution with a shifting frame that moves 100% along x-axis.
##Conclusion
Quad tree is a fun tree structure that is relatively easy to implement. Tree size and bucket size can affect tree performance, so optimizing the latter might pay off in critical contexts.
With a relatively large tree (1000.000 items) in typical searches, I found the best tested parameters over three times more performant than the worst. Insert and replace times varied too, but not as significantly.
Tree performance can be summarized by the following negative correlations.
- faster inserts correlate with slower searches;
- faster searches on restricted subareas slightly correlate with slower searches on larger subareas.
If your domain size is predictable, I recommend doing a binary search over static bucket sizes to determine an optimized solution (these probably follow a polynomial curve which could be induced from measurements, or deduced from tree properties). For a quick general solution, making bucket size dynamic with an exponent of 0.5 (square root of tree size) seems a good go.
##Benchmark details
Static 10
---------
Benchmark Mode Cnt Score Error Units
MyBenchmark.testStatic1000 thrpt 89 2.251 ± 0.097 ops/s
MyBenchmark.testStatic1000Gets100th thrpt 200 3215.602 ± 29.730 ops/s
MyBenchmark.testStatic1000Gets16th thrpt 200 610.941 ± 5.751 ops/s
MyBenchmark.testStatic1000Gets4th thrpt 200 178.086 ± 1.830 ops/s
MyBenchmark.testStatic1000Replace thrpt 200 22.774 ± 0.035 ops/s
MyBenchmark.testStatic100 thrpt 200 39.301 ± 1.126 ops/s
MyBenchmark.testStatic100Gets100th thrpt 200 42985.199 ± 439.966 ops/s
MyBenchmark.testStatic100Gets16th thrpt 200 9658.681 ± 91.600 ops/s
MyBenchmark.testStatic100Gets4th thrpt 200 3087.886 ± 33.221 ops/s
MyBenchmark.testStatic100Replace thrpt 200 22.748 ± 0.029 ops/s
MyBenchmark.testStatic10 thrpt 200 651.525 ± 6.232 ops/s
MyBenchmark.testStatic10Gets100th thrpt 200 370994.830 ± 1628.220 ops/s
MyBenchmark.testStatic10Gets16th thrpt 200 101044.828 ± 756.473 ops/s
MyBenchmark.testStatic10Gets4th thrpt 200 34990.222 ± 258.951 ops/s
MyBenchmark.testStatic10Replace thrpt 200 22.753 ± 0.026 ops/s
Static 100
----------
Benchmark Mode Cnt Score Error Units
MyBenchmark.testStatic1000 thrpt 200 3.674 ± 0.073 ops/s
MyBenchmark.testStatic1000Replace thrpt 200 22.538 ± 0.069 ops/s
MyBenchmark.testStatic1000Gets100th thrpt 200 5004.158 ± 66.833 ops/s
MyBenchmark.testStatic1000Gets16th thrpt 200 1002.217 ± 8.836 ops/s
MyBenchmark.testStatic1000Gets4th thrpt 200 289.583 ± 3.994 ops/s
MyBenchmark.testStatic100 thrpt 200 59.924 ± 0.296 ops/s
MyBenchmark.testStatic100Replace thrpt 200 22.634 ± 0.075 ops/s
MyBenchmark.testStatic100Gets100th thrpt 200 46923.773 ± 1634.403 ops/s
MyBenchmark.testStatic100Gets16th thrpt 200 11842.317 ± 170.995 ops/s
MyBenchmark.testStatic100Gets4th thrpt 200 3626.730 ± 151.959 ops/s
MyBenchmark.testStatic10 thrpt 200 852.848 ± 3.855 ops/s
MyBenchmark.testStatic10Replace thrpt 200 22.551 ± 0.096 ops/s
MyBenchmark.testStatic10Gets100th thrpt 200 331462.994 ± 6594.212 ops/s
MyBenchmark.testStatic10Gets16th thrpt 200 106654.868 ± 946.438 ops/s
MyBenchmark.testStatic10Gets4th thrpt 200 40281.885 ± 497.629 ops/s
Dynamic (TSIZE ^1/3)
-----------
MyBenchmark.testDynamic1000 thrpt 200 4.788 ± 0.111 ops/s
MyBenchmark.testDynamic1000Replace thrpt 200 22.018 ± 0.473 ops/s
MyBenchmark.testDynamic1000Gets100th thrpt 200 1773.459 ± 124.443 ops/s
MyBenchmark.testDynamic1000Gets16th thrpt 200 641.420 ± 24.462 ops/s
MyBenchmark.testDynamic1000Gets4th thrpt 200 251.728 ± 4.703 ops/s
MyBenchmark.testDynamic100 thrpt 200 65.980 ± 0.591 ops/s
MyBenchmark.testDynamic100Replace thrpt 200 22.450 ± 0.075 ops/s
MyBenchmark.testDynamic100Gets100th thrpt 200 25044.196 ± 2141.868 ops/s
MyBenchmark.testDynamic100Gets16th thrpt 200 7152.725 ± 418.036 ops/s
MyBenchmark.testDynamic100Gets4th thrpt 200 3052.622 ± 72.834 ops/s
//Missing data
MyBenchmark.testDynamic10Replace thrpt 200 22.668 ± 0.053 ops/s
MyBenchmark.testDynamic10Gets100th thrpt 200 224504.656 ± 14537.053 ops/s
MyBenchmark.testDynamic10Gets16th thrpt 200 90306.673 ± 2800.908 ops/s
MyBenchmark.testDynamic10Gets4th thrpt 200 32746.688 ± 444.270 ops/s
Dynamic (TSIZE ^1/2)
-----------
Benchmark Mode Cnt Score Error Units
MyBenchmark.testDynamic1000 thrpt 200 4.962 ± 0.104 ops/s
MyBenchmark.testDynamic1000Replace thrpt 200 22.688 ± 0.050 ops/s
MyBenchmark.testDynamic1000Gets100ththrpt 200 5103.998 ± 34.889 ops/s
MyBenchmark.testDynamic1000Gets16th thrpt 200 1165.960 ± 9.314 ops/s
MyBenchmark.testDynamic1000Gets4th thrpt 200 347.509 ± 3.607 ops/s
MyBenchmark.testDynamic100 thrpt 200 67.917 ± 0.385 ops/s
MyBenchmark.testDynamic100Replace thrpt 200 22.434 ± 0.157 ops/s
MyBenchmark.testDynamic100Gets100th thrpt 200 44350.620 ± 760.833 ops/s
MyBenchmark.testDynamic100Gets16th thrpt 200 11577.383 ± 90.534 ops/s
MyBenchmark.testDynamic100Gets4th thrpt 200 3843.733 ± 42.209 ops/s
//missing data
MyBenchmark.testDynamic10Replace thrpt 200 22.549 ± 0.055 ops/s
MyBenchmark.testDynamic10Gets100th thrpt 200 326871.228 ± 5853.849 ops/s
MyBenchmark.testDynamic10Gets16th thrpt 200 106694.737 ± 1094.796 ops/s
MyBenchmark.testDynamic10Gets4th thrpt 200 40501.948 ± 529.682 ops/s
Dynamic 0.75
------------
Benchmark Mode Cnt Score Error Units
MyBenchmark.testDynamic1000_75_ thrpt 200 7.970 ± 0.131 ops/s
MyBenchmark.testDynamic1000_75_Replace thrpt 200 22.678 ± 0.054 ops/s
MyBenchmark.testDynamic1000_75_Gets100th thrpt 200 1561.178 ± 62.218 ops/s
MyBenchmark.testDynamic1000_75_Gets16th thrpt 200 651.231 ± 14.533 ops/s
MyBenchmark.testDynamic1000_75_Gets4th thrpt 200 270.326 ± 3.476 ops/s
MyBenchmark.testDynamic100_75_ thrpt 200 95.468 ± 0.571 ops/s
MyBenchmark.testDynamic100_75_Replace thrpt 200 22.471 ± 0.133 ops/s
MyBenchmark.testDynamic100_75_Gets100th thrpt 200 13502.331 ± 523.206 ops/s
MyBenchmark.testDynamic100_75_Gets16th thrpt 200 5680.445 ± 128.139 ops/s
MyBenchmark.testDynamic100_75_Gets4th thrpt 200 2692.851 ± 40.381 ops/s
MyBenchmark.testDynamic10_75_ thrpt 200 1057.870 ± 3.713 ops/s
MyBenchmark.testDynamic10_75_Replace thrpt 200 22.761 ± 0.030 ops/s
MyBenchmark.testDynamic10_75_Gets100th thrpt 200 124274.483 ± 3685.147 ops/s
MyBenchmark.testDynamic10_75_Gets16th thrpt 200 60084.731 ± 803.888 ops/s
MyBenchmark.testDynamic10_75_Gets4th thrpt 200 29446.770 ± 294.976 ops/s
Static 100, uniform distribution with a shifting frame
------------
Benchmark Mode Cnt Score Error Units
MyBenchmark.testWindStatic1000_100_ thrpt 200 3.681 ± 0.062 ops/s
MyBenchmark.testWindStatic1000_100_Replace thrpt 200 22.679 ± 0.036 ops/s
MyBenchmark.testWindStatic1000_100_Gets100th thrpt 200 4971.401 ± 96.417 ops/s
MyBenchmark.testWindStatic1000_100_Gets16th thrpt 200 1040.246 ± 7.080 ops/s
MyBenchmark.testWindStatic1000_100_Gets4th thrpt 200 300.385 ± 2.343 ops/s
Benchmark Mode Cnt Score Error Units
MyBenchmark.testWindStatic100_100_ thrpt 180 58.603 ± 0.351 ops/s
MyBenchmark.testWindStatic100_100_Replace thrpt 200 22.544 ± 0.071 ops/s
MyBenchmark.testWindStatic100_100_Gets100th thrpt 200 48023.528 ± 1055.287 ops/s
MyBenchmark.testWindStatic100_100_Gets16th thrpt 200 11705.878 ± 196.446 ops/s
MyBenchmark.testWindStatic100_100_Gets4th thrpt 200 3920.593 ± 62.509 ops/s
MyBenchmark.testWindStatic10_100_ thrpt 158 835.297 ± 3.547 ops/s
MyBenchmark.testWindStatic10_100_Replace thrpt 200 22.427 ± 0.117 ops/s
MyBenchmark.testWindStatic10_100_Gets100th thrpt 200 342671.618 ± 3990.980 ops/s
MyBenchmark.testWindStatic10_100_Gets16th thrpt 200 104766.418 ± 715.201 ops/s
MyBenchmark.testWindStatic10_100_Gets4th thrpt 200 40702.244 ± 173.719 ops/s
Dynamic ^0.5, U with shifting frame
------------
Benchmark Mode Cnt Score Error Units
MyBenchmark.testWindDyn1000_05_ thrpt 200 5.727 ± 0.094 ops/s
MyBenchmark.testWindDyn1000_05_Replace thrpt 200 22.574 ± 0.132 ops/s
MyBenchmark.testWindDyn1000_05_Gets100th thrpt 200 5219.104 ± 17.831 ops/s
MyBenchmark.testWindDyn1000_05_Gets16th thrpt 200 1176.705 ± 3.574 ops/s
MyBenchmark.testWindDyn1000_05_Gets4th thrpt 200 350.920 ± 3.357 ops/s
MyBenchmark.testWindDyn100_05_ thrpt 180 72.104 ± 0.467 ops/s
MyBenchmark.testWindDyn100_05_Replace thrpt 200 22.774 ± 0.022 ops/s
MyBenchmark.testWindDyn100_05_Gets100th thrpt 200 44163.428 ± 746.761 ops/s
MyBenchmark.testWindDyn100_05_Gets16th thrpt 200 11647.066 ± 326.001 ops/s
MyBenchmark.testWindDyn100_05_Gets4th thrpt 200 4035.522 ± 22.628 ops/s
MyBenchmark.testWindDyn10_05_ thrpt 145 879.953 ± 3.049 ops/s
MyBenchmark.testWindDyn10_05_Replace thrpt 200 22.587 ± 0.211 ops/s
MyBenchmark.testWindDyn10_05_Gets100th thrpt 200 326384.871 ± 5617.785 ops/s
MyBenchmark.testWindDyn10_05_Gets16th thrpt 200 108267.396 ± 919.802 ops/s
MyBenchmark.testWindDyn10_05_Gets4th thrpt 200 41262.699 ± 243.515 ops/s