[EN] The Jarvis March
I am working on a small project that involves a bunch of geometry algorithms. Since it's something I'm doing for fun, I'm rolling my own implementations of every algorithm I use, in modern C++.
Right now, I needed a convex hull algorithm that was going to be used for a small amount of vertices (think up to 30 or so).
A convex hull is the smallest convex polygon that contains every point in a set. You can think of it as the shape drawn by sticking a bunch of nails to a piece of wood and then releasing a rubber band around them. I don't have a picture of that but, in a more abstact setting, it looks like this:
There are a few different algorithms with different performance characteristics that will compute the convex hull for us. The two major ones are The Jarvis March (a very simple O(N^2)), the Graham Scan (more complicated, O(NlogN)).
Since my use case involves very few vertices, I went with the Jarvis March.
The Jarvis March
This is the algorithm that anyone could come up with when tasked with solving this problem. It is very simple and intuitive. The algorithm works by keeping a set of points that we know are consecutive on the convex hull, and then finding the one that goes next.
This means we start by finding a single point that is definitely on the convex hull and then work from there. A simple way to do this is to grab one of the points that are either farthest to left, to the right, up, or down. For this example I'll take the rightmost one.
After that, we'll just run through all the other points and pick the one that is the farthest along clockwise (or counter-clockwise. Which direction we choose is not very important but we need to be consistent) from the point we just found.
Then, we just repeat the procedure for each point until we find that we have gone the whole way around.
Something to note is that it is not necessary to check the points that we already know are part of the convex hull, with the exception of the first element, which will be our condition for stopping the loop.
Asymptotic Complexity
This algorithm runs through our data once per point on the convex hull. This gives a runtime complexity of O(N*H) where H is the amount of points on the convex hull. In the worst case, every point is on the convex hull, meaning the complexity is actually O(N^2). However, it is worth noting that for sufficiently random-looking data, the amount of points on the convex hull is proportional to sqrt(N) so in practice it should be way faster. Also, structured data (like grids) tends to have even fewer points on the convex hull.
We can also get a constant factor improvement by not checking points that are already on the convex hull. This makes our algorithm's worst case approximately twice as fast.
Implementing The Jarvis March
There are many ways to implement this algorithm and many decisions can be made. I decided to implement mine as an in place algorithm with an iterator-like interface. This means that the signature looks like this:
// Arranges chull between begin and return value vec* inplace_chull (vec* begin, vec* end) {
Something else I wanted to do is to try and use STL algorithms as much as possible, within reason. This can be seen in the following snippet, where i use std::min_element. The Jarvis March is very structurally similar to selection sort, so it makes sense that this algorithm would naturally show up.
{ auto by_x_coord = [](vec a, vec b) -> bool {return a.x < b.x;}; auto leftmost = std::min_element(begin, end, by_x_coord); std::iter_swap(begin, leftmost); }
Since i was using STL and lambdas i figured i would make myself a little helper that makes comparator lambdas for sorting and searching by angle around a particular point.
auto by_angle_around = [](vec v) { return [=](vec a, vec b) -> bool { return ((a-v)^(b-v)) > 0.0f; }; };
Explaining why this code works mathematically is outside the scope of this article but i should at least explain the notation: the vec class models a 2d vector and has various operators overloaded, including + for addition, - for substraction, * for dot product, and ^ for cross product (determinant of the 2x2 matrix made by putting the first operand as a row vector on top of the second operand as a row vector). For this to work correctly, v must be on the convex hull of the set of points.
The usage of this helper is quite intuitive, you give it a vertex and it returns a comparator that does the right thing.
{ auto second_vertex = std::min_element(begin+1, end, by_angle_around(*begin)); std::iter_swap(begin+1, second_vertex); }
With those two lines we just found the second point on the convex hull. All we need to do now is write a loop that finds the next point on the chull through the same means and stops once it hits the first point.
{ auto* last = end-1; for (auto* it = begin+1; it != last; ++it) { auto next_vertex = std::min_element(begin, end, by_angle_around(*it)); if (next_vertex == begin) return it+1; else std::iter_swap(next_vertex, it+1); } return end; }
This is good right? Not really. The problem here is that we don't want the current vertex to be selected as next vertex by accident, so we need some way to exclude it from our search.
There is a very simple way to do this, we write the search ourselves and just add a condition that the inner loop iterator is not the same as the outer loop one:
{ auto* last = end-1; for (auto* it = begin+1; it != last; ++it) { auto comparator = by_angle_around(*it); auto next_vertex = begin; for (auto inner = begin+1; inner != end; ++inner) if (inner != it && comparator(*inner, *next_vertex)) next_vertex = inner; if (next_vertex == begin) { return it+1; } else { std::iter_swap(next_vertex, it+1); } } return end; }
While this works, we are now suddenly left with a messy loop in the middle of our otherwise clean code.
More importantly, we are now forced to look through the points which are already part of the convex hull, adding a whole bunch of unnecessary work.
I came up with two solutions.
Solution 1 : swap the current element with the first element, then swap them back once we are done searching.
{ auto last = end-1; for (auto it = begin+1; it != last; ++it) { std::iter_swap(begin, it); auto next_vertex = std::min_element(it, end, by_angle_around(*begin)); std::iter_swap(begin, it); if (next_vertex == it) { return it+1; } else { std::iter_swap(next_vertex, it+1); } } return end; }
This has a few nice properties:
- The extra branch is eliminated
- We get to encapsulate our search in a standard algorithm.
- We get to skip over all of the points which we already know are part of the convex hull.
Solution 2 : Take the min between begin and the min of the rest of the array
{ auto* last = end-1; for (auto* it = begin+1; it != last; ++it) { auto comp = by_angle_around(*it); auto* next_vertex = std::min(begin, std::min_element(it+1, end, comp), [=](auto a, auto b){return comp(*a, *b);}); if (next_vertex == begin) { return it+1; } else { std::iter_swap(next_vertex, it+1); } } return end; }
This also has a few nice properties:
- The extra branch is eliminated
- We get to encapsulate our search in a standard algorithm.
- We get to skip over all of the points which we already know are part of the convex hull.
Hmmmm... seem familiar? Both solutions address the same issues with the original code. So... which one is better?
Well, to be honest, I have no idea. This is a very common situation to be in when tuning performance, and it's a situation you must be able to identify quickly, because only then will you realize that you need to measure. I realized that, so I did, I measured.
Performance & Measuring
When we have to choose between two implementations, we don't really care about absolute time. We care about which one is faster. So I'm not going to include absolute measurements, only relative, and quantized pretty aggressively.
Compiler | -Olevel | S1 / S2 |
---|---|---|
Clang | -O0 | 1.0 / 0.95 |
Clang | -O3 | 1.0 / 1.2 |
GCC | -O0 | 1.0 / 0.95 |
GCC | -O3 | 1.0 / 0.45 |
Well, you might expect one to be consistently faster than the other (I know I did). But that didn't really happen... so... what's the takeaway here?
Optimization is very sensitive to the context, if you're gonna measure the speed of your program, then you should measure it in the context of your real build target. Measuring in a debug build, or on a different compiler or platform than the intended environment will not produce accurate results and can lead you to inadavertedly pesimize your code, sometimes producing a program that runs OVER TWICE AS SLOW. Keep this in mind when writing performace critical code.
About me
My name is Sebastian. I'm a competitive programmer (ICPC World Finalist) and coach. I also enjoy learning about programming language theory and computer graphics.
Social links:
Comments
Post a Comment