A* Pathfinding + Speed Contest

THE PROJECT

For this project I was tasked with programming the A* algorithm for an agent. An incentive to make the most efficient solution possible a class wide contest (approximately 60 students) was issued, and I ended up jumping at the opportunity.

MY APPROACH

The most common operation in A* is finding the next cheapest node to explore in the open list in which pushing and popping nodes are the most common operation performed. Therefore, these are the operations you want to be cheap.
 
The most popular way of implementing the open list is to use a heap for the nodes. This lets you pop and push in O(log(n)) time. While this may be good for general solutions, our project was guaranteed to be a square 2D graph with a maximum number of nodes. This allowed me to use a more customized data structure for the open list which I will refer to as a bucket list. This is essentially a 2D array, where you can define how nodes are placed into buckets and keep track of the current cheapest bucket to pop from.
 
A bucket list has the added benefit of not needing to sort the individual buckets, since you generally only have a few nodes (<20) per bucket it is faster to search the bucket for the cheapest node in the bucket. 
 
Another important factor is the size of the nodes. Ideally you want each node to be as small as possible. To achieve this, I used binary values to store neighbor and parent information. This resulted in adding a single byte of information while still allowing access to 256 possibilities.
 
After implementing the above the average A* search took approximately 2000 microseconds (μs) to complete or 2 milliseconds (ms). While I was extremely happy with this time, I wanted to see if I could push it even further.
 
After profiling the tests, I realized that resetting nodes was a costly operation. To improve upon this, I added visited nodes to a list and would only reset the list of visited nodes. The time improvements this saved were impressive, cutting the average A* search time in half to approximately 1000 μs. 
 
I wanted to see if I could take this even further by eliminating the need to reset nodes all together. This was done by increasing the size of each node slightly by adding an iteration count to it. When visiting a node, I could then check the iteration count to determine if the node had been visited or not. I did see an improvement of about 50 μs in the test results but overall, it had a minor effect.
 
Now that my data structures were optimized it was time to turn to the search algorithm and see how I could optimize that. The first step I took was to eliminate backtracking. Once you move to a new node, you do not need to visit the previous node again. Normally you search all open nodes connected to a node, but by always moving towards the goal and factoring out backtracking you can eliminate 3-5 neighboring nodes from a single nodes neighbor search. This surprised me with a 200 μs speed improvement.
 
The final improvement I made was creating a function lookup table. Instead of iterating over all neighbors in which invalid neighbors are skipped, it turns out it is faster to jump to a function that will only evaluate the valid neighbors. To do this I created a separate function using the concept of subsets to generate the 256 required functions for the lookup table. This improvement was even better than backtracking resulting in a 300 μs speed reduction. 
 
At this point I spent about 30 hours working on the project and went from a 2000 μs average down to 500 μs average or 0.5 ms and was satisfied with my accomplishment. The effort paid off allowing me to win the speed contest.

TAKING THIS FURTHER

Here is a list of ideas to take this further and possibly see additional speed improvements:
  • Reduce memory size of each node. 
    • For example: using u16 ints (fixed cost) as the cost of each node. This may help performance by being able to fit more nodes in the cache and on a cache line.
  • Finding a way to better distribute nodes in the bucket list. 
    • Right now, the table looks like a normal distribution with the number of nodes in each bucket. Flattening that curve would reduce the number of collisions and per-bucket search time.
  • Don't even search the unsorted buckets. 
    • If the buckets are small enough, any node in the cheapest bucket would be a valid next node to search in the open list.

ADDING GOAL BOUNDING

Goal bounding is a pathfinding optimization technique that can speed up A* by roughly eight times on a grid. Goal bounding is not a search algorithm itself, but rather a method to prune the search space, thus radically reducing the number of nodes that need to be considered to find the goal. This is accomplished by preprocessing the search space offline and using the precomputed data to avoid exploring many nodes that do not lead to the goal.

A* WITHOUT GOAL BOUNDING

A* pathfinding takes approximately 16000 μs or 16 milliseconds to find the path, even with all the optimizations mentioned above.

A* WITH GOAL BOUNDING

A* pathfinding plus Goal Bounding reduces the search time to approximately 180 μs or 0.18 ms for a similar search. This is a dramatic speed reduction if you can afford the associated memory cost. With A* there are approximately 20 nodes per bucket while searching, but with the addition of Goal Bounding it improves to an average of <3 nodes per bucket.