Depth First Search, Breadth First Search and A* Search - a quick guide
What is the most efficient way for an army to traverse a battlefield given a river blocking one portion of the terrain, a hill blocking visibility in another portion and a wall in yet another? What is the quickest way to travel from London to Cambridge? Finding the shortest path from point A to point B given certain restrictions is a problem that humanity has faced over and over, throughout history and even throughout our daily lives.
There are a number of ways to traverse a graph of nodes.
The simplest, and least efficient, is called Depth First Search. As the name suggests, this algorithm involves starting from a root node, and then exploring adjacent nodes in order or depth (leftmost nodes are explored first). Depth First Search is best visualized with a stack of paper: a traveler starts with one page dictating a location to go to. As he discovers more pages, these get added to the top of the stack. The next location traveled to is taken from the top of the stack. If plotted on a map, this would result in an exploration that resembles outward rays from a starting point.
Pseudo-code for such a procedure would look like the following
def depthFirstSearch(problem): stack = util.Stack() stack.push((problem.getStartState(), )) visited = set() while not stack.isEmpty(): (node, path) = stack.pop() if problem.isGoalState(node): return path visited.add(node) for state, action, cost in problem.getSuccessors(node): if state not in visited: stack.push((state, path + [action]))
This methodology is called FIFO (first in first out). While conceptually simple, this strategy is not clever. Suppose, for instance, that the goal could have been reached after adding a single stack of 2 pages. Using the FIFO methodology, the traveler would keep piling pages on pages – potentially ad infinitum – only to discover that the goal was in the first stack. If one envisions a graph, this would be tantamount to starting at a root node and following the tree downwards all the way to the bottom before exploring any adjacent nodes to the right.
Breadth First Search solves this conundrum. Importantly, this graph traversal methodology uses a queue instead of a stack of pages. Contrary to a stack, a queue uses a LIFO (last in first out) methodology. Our traveler would now start with one page dictating a location to go to, discover more pages, add them to his queue, and determine his next location by visiting those places determined by the bottom of his list. If plotted on a map, this would result in an exploration that resembles concentric circles.
Pseudo-code would look like the following
def breadthFirstSearch(problem): queue = util.Queue() queue.push((problem.getStartState(), )) visited = set() while not queue.isEmpty(): (node, path) = queue.pop() if problem.isGoalState(node): return path visited.add(node) for state, action, cost in problem.getSuccessors(node): if state not in visited: queue.push((state, path + [action]))
Notice how the only difference between the two is that Depth First uses a stack object, while Breadth First uses a queue.
While this guarantees the traveler to find his goal, there is a cleverer way for him to find his path.
Enter A* Search (pronounced A-star search). This algorithm is a take-off of Dijkstra's algorithm with an additional piece of logic – the best-first heuristic. A* Search looks at adjacent nodes in a graph and determines the most promising, or as it were, the one with the least projected cost. Once a possible and cheap path is determined, it stays in a priority queue until all the best options have been explored first.
So now, if our traveler starts with one page dictating a location to go, he will organize the subsequent pages he finds. Page 1 tells him his next location is at a distance of 5km. He approximates that his goal is still 10km away, so he adds these values to get a total cost of 15. Page 2 indicates that the next location is at a distance of 2km. As he still approximates his end goal at a distance of 10km, the total cost for this route is only 12. This is a better path to go down as it would allow him to get to his destination more rapidly.
This algorithm he uses is called the cost function. Quite simply, the cost function is determined by the following addition:
f = g + h
As the parable suggests, g is the cost to get to the subsequent node, whereas h is the heuristic cost (estimated value) that the traveler believes is remaining until his destination. Due to this simple yet clever addition, A* is virtually guaranteed to find the best path within a short amount of time if the heuristic is correct.
The pseudo-code for A* would look like the following:
def aStarSearch(problem, heuristic=nullHeuristic): queue = util.PriorityQueue() start_state = problem.getStartState() queue.push((start_state, ), heuristic(start_state, problem)) visited = set() while not queue.isEmpty(): (node, path) = queue.pop() if problem.isGoalState(node): return path visited.add(node) for state, action, cost in problem.getSuccessors(node): if not state in visited: new_state = path + [action] f = problem.getCostOfActions(new_state) + heuristic(state, problem) queue.push((state, new_state), f)
In reality, the heuristic will not be perfect, but a good guess is far better than traversing the entire graph without rhyme or reason. Importantly, however, A* search will potentially explore the same node multiple times if the other explored paths were ultimately less appealing than initially estimated.