Minimize estimated cost to reach a goal: Greedy search
The simplest best-first strategy is to minimize the estimated cost
to reach the goal, i.e., always expand the node that appears to be closest
to the goal. A function that calculates cost estimates is called an heuristic
function.
To get an idea of what an heuristic function looks like, lets look at a particular problem. Here we can use as h the straight-line distance to the goal. To do this, we need the map co-ordinates of each city. This heuristic works because roads tend to head in more or less of a straight line.
This figure shows the progress of a greedy search to find a path from
Arad to Bucharest.
For this problem, greedy search leads to a minimal cost search because
no node off the solution path is expanded. However, it does not find the
optimal path: the path it found via Sibiu and Fagaras to Bucharest is 32
miles longer than the path through Pimnicu Vilcea and Pitesti.
Hence, the algorithm always chooses what looks locally best, rather
than worrying about whether or not it will be best in the long run. (This
is why its called greedy search.)
Greedy search is susceptible to false starts. Consider the problem
of getting from Iasi to Fagaras. h suggests that Neamt be expanded first,
but it is a deadend. The solution is to go first to Vaslui and then continue
to Urziceni, Bucharest and Fagaras.
Note that if we are not careful to detect repeated states, the solution
will never be found - the search will oscillate between Neamt and Iasi.
Greedy search resembles dfs in the way that it prefers to follow a
single path to the goal and backup only when a deadend is encountered.
It suffers from the same defects as dfs - it is not optimal and it is incomplete
because it can start down an infinite path and never try other possibilities.
The worst-case complexity for greedy search is O(b^{m}),
where m is the maximum depth of the search. Its space complexity is the
same as its time complexity, but the worst case can be substantially reduced
with a good heuristic function.
Minimizing the total path cost: A^{*} search
Recall that uniform search minimizes the cost of the path so far, g(n):
it is optimal and complete, but can be very inefficient. We can combine
the use of g(n) and h(n) simply by summing them
Hence, if we are trying to find the cheapest solution, a reasonable thing to try is the node with the lowest value for f. It is possible to show that this strategy is complete and optimal, given a simple restriction on the h function: h must never overestimate the cost to reach the goal. Such an h is called an admissible heuristic.
If h is admissible, f(n) never overestimates the actual cost of the best solution through n.
Best-first search using f and an admissible h is known as A^{*} search.
Notice that h_{sld} (straight-line distance) is an addmissible heuristic because the shortest path between two points is a straight line. The following diagram shows the first few steps of the A^{*} search for Bucharest using h_{sld}.
Notice that A^{*} prefers to expand from Rimnicu Vilcea rather
than Fagaras. Even though Fagaras is closer to Bucharest, the path taken
to get to Fagaras is not as efficient in getting close to Bucharest
as the path taken to get to Rimnicu.
The behaviour of A^{*} search
An important thing to notice about the example A^{*} search
above: along any path from the root, the f-cost never decreases. This fact
holds true for almost all admissible heuristics. An heuristic with this
property is said to exhibit monotonicity.
Monotonicity is not quite the same as admissible. In a few cases, heuristics are admissible, but not monotonic.
When our heuristics are not monotonic, we will make them so, so we can assume that if an heuristics is admissible it is monotonic.
Now, we can take a look at what is going on in A^{*}. If f never decreases along any path out of the root, we can conceptually draw contours in the state space.
Because A^{*} expands the leaf node of lowest f, an A^{*}
search fans out from the start node, adding nodes in concentric bands of
increasing f-cost.
With uniform-cost search (A^{*} using h=0), the bands will
be "circular" around the start node. WIth more accurate heuristics, the
bands will stretch towards the goal state and become narrowly focused around
the optimal path. If we define f^{*} to the the cost of the optimal
solution path, we can say:
A^{*} is optimally efficient for any given admissible heuristic function. This is because any algorithm that does not expand all nodes in the contours between the root and the goal runs the risk of missing the optimal solution.
Complexity of A^{*}
The catch with A^{*} is that even though its complete, optimal
and optimally efficient, it still can't always be used, because for most
problems, the number of nodes within the goal contour search space is still
exponential in the length of the solution.
Similarly to breadth-first search, however, the major difficulty with A^{*} is the amount of space that it uses.
If we want to find shortest solutions, we need a function that never overestimates the number of steps to the goal. Here are two possibilities:
Usually, the effective branching factor of an heuristic is fairly constant over problem instances and, hence, experimental measurements of b^{*} on a small set of problems provides a good approximation of the heuristic's usefulness.
A well designed heuristic should have a value of b^{*} that
is close to 1. The following is a table of the performance of A^{*}
with h1 and h2 above on 100 randomly generated problems. It shows that
h2 is better than h1 and that both are much better than iterative deepening
search.
Note that h2 is always a better heuristic function than h1.
Inventing heuristic functions
Is is possible for heuristic functions to be invented mechanically?
In many cases, the answer is yes. In the 8-puzzle case, h1 and h2 can
be considered perfectly accurate path lengths for simplified versions of
the 8-puzzle problem. That is, if the rules of the puzzle were changed
so that a tile could be moved anywhere instead of just into an opening,
h1 would accurately give the number of steps in the shortest solution.
Is there a simplified problem in which h2 gives correct path lengths?
Yes. If a tile could move a square in any direction, even onto an occupied
square.
A problem with less restrictions on its operators is called a relaxed
problem. It is often the case that the cost of an exact solution to
a relaxed problem is a good heuristic for the original problem.
If a problem definition is written down in a formal language, it is
possible to automatically generate relaxations. Again in 8-puzzle, the
operators can be described as:
It turns out that if there are several choices for heuristic functions with none clearly dominating the others, and they are all admissible, then they can be combined as
Often it is possible to pick out features of a state that contribute to its heuristic evaluation, even if it is hard to say what the contribution should be. For example, the goal in chess is to checkmate and the relevant features include number of pieces of each kind belonging to each side, the number of pieces that are attacked by opponent's pieces, etc.
Usually, the evaluation function is assumed to be a linear combination of the features and even if we don't know the weights, it is possible to use a learning algorithm to acquire them.
We have not considered the cost of computing the heuristic functions in our discussion. So long as its about the cost of expanding a node, this is not an issue, but when the cost is as high as expanding several hundred nodes, it can no longer be ignored.
Heuristics for constraint satisfaction
We have so far examined variants of depth first search for solving
CSPs. Now we extend the analysis by considering heuristics for selecting
a variable to instantiate and for choosing a value for the variable.
Suppose we are trying to colour the following map with three colours
so that no two adjacent regions have the same colour.
Clearly the best next move is to colour region E first because the
only possible colour for it is blue. All the other regions have a choice
and we might make the wrong one requiring us to backtrack.
This idea is called the most-constrained-variable heuristic. It is used with forward checking. At each point in the search, the variable with the fewest possible values is chosen to have a value assigned next. In this way the branching factor in the search tends to be minimized. When this heuristic is used in solving the n-queens problems, the maximum value of n goes from 30 for forward checking to around 100.
The most-constraining-variable heuristic attempts to reduce the branching factor on future choices by assigning a value to the variable that is involved in the largest number of constraints on other unassigned variables.
Once a variable has been selected, a value must still be chosen for it. For example, suppose we decide to assign a value to region C after A and B. Red is a better choice than blue because it leaves more freedom for future choices. This is the least-constraining-value heuristic - choose a value that rules out the smallest number of values in variables connected to the current variable by constraints. When applied to the n-queens problem, it allows problems up to n=1000 to be solved.
IDA^{*} is complete and optimal, but because it is depth-first it only requires space proportional to the longest path that it explores. b*d is a good estimate of storage requirements.
The algorithm's time complexity depends on the number of different values that the h function can take on. The city block distance function used in the 8-puzzle takes on one of a small number of integer values. So typically, f only increases two or three times along any solution path. Hence, IDA^{*} only goes through 2 or 3 iterations. Because IDA^{*} does not need to maintain a queue of nodes, its treatment of nodes is faster.
Unfortunately, IDA^{*} has difficulty in more complex domains. In the travelling salesperson problem, for example, the heuristic value is different for every state. As a result, the search is much more like iterative deepening: if A^{*} expands N nodes, IDA^{*} will have to go through N iterations and will therefore expand 1+2+...+N=O(N^{2}) nodes.
One way around this problem is to increase the f-cost limit by a fixed amount [e] on each iteration, so that the total number of iterations is proportional to 1/[e]. This can reduce the search cost at the expense of returning solutions that can be worse than optimal by at most [e]. Such an algorithm is called [e]-admissible.
SMA^{*} search
IDA^{*}s difficulties in some problem spaces can be traced to using too little memory. Between iterations, it retains only a single number, the current f-cost limit.
SMA^{*}(Simplified memory-bounded A^{*}) can make use of all available memory to carry out the search. SMA^{*} has the following properties:
When SMA^{*} needs to make space on its queue, it drops a node from the queue. It prefers to drop nodes that are unpromising - that is nodes with high f-cost. To avoid reexploring subtrees that it has dropped from memory, it retains in the ancestor nodes information about the quality of the best path in the forgotten subtree. In this way, it only regenerates the subtree when all other paths have been shown to look worse than the path it has forgotten.
Example Here is a hand simulation of SMA^{*} on
illustrates how it works.
In this case, it would return D which is a reachable, non-optimal solution.
Given a reasonable amount of memory, SMA^{*} can solve significantly more difficult problems that A^{*} without incurring significant overhead in terms of extra nodes generated. It performs well on problems with highly connected state spaces with real-valued heuristics, on which IDA^{*} has difficulty.
For this kind of algorithm, consider a grid containing all of the problem states. The height of any point corresponds to the value of the evaluation function on the state at that point.
The idea of iterative improvement algorithms is to move around the grid, trying to find the highest peaks, which are optimal solutions.
These algorithms usually keep track only of the current state and do not look ahead beyond immediate neighbors.
There are two major classes of iterative improvement algorithms: Hill-climbing algorithms try to make changes that improve the current state. Simulated annealing algorithms sometimes make changes that make things worse.
Hill-Climbing Search
Always move in a direction of increasing quality. This simple policy has three well-known drawbacks:
Ridges - a ridge can have steeply sloping sides, so that the search reaches the top with ease, but on the top may slope gently toward a peak. Unless there happen to be operators that move directly along the top of the ridge, the search may oscillate from side to side making little or no progress.
Simulated Annealing
Allow hill-climbing to take some downhill steps to escape local maxima.
The innermost loop of simulated annealing is quite similar to hill-climbing. Instead of picking the best move, however, it picks a random move. If the move actually improves the situation, it is executed. Otherwise, the algorithm make the move with some probability less than 1. The probability decrease exponentially with the badness of the move - the amount [[Delta]]E by which the evaluation is worsened.
A second parameter T is also used to determine the probability. At higher values of T, bad moves are more likely to be allowed. As T tends towards 0, they become more and more unlikely, until the algorithms behaves like hill-climbing. The schedule input determines the value of T as a function of how many cycles already have been completed.
Simulated annealing was developed from an explicit analogy with annealing - the process of gradually cooling a liquid until it freezes. The value function corresponds to the total energy of the atoms in the material and T corresponds to the temperature. The schedule determines the rate at which the temperature is lowered. Individual moves in the state space correspond to random fluctuations due to thermal noise. One can prove that if the temperature is lowered sufficiently slowly, the material will attain a lowest-energy configuration. This corresponds to the statement that if schedule lowers T slowly enough, the algorithm will find a global optimum.
Applications to constraint satisfaction problems
CSPs can be solved by iterative improvement methods by first assigning values to all the variables and then applying modification operators to move the configuration towards a solution. These operators simply assign a different value to a variable, e.g., 8-queens.
Algorithms that solve CSPs in the fashion are called heuristic repair methods. In choosing a new value for a variable, the most obvious heuristic is to select the value that results in the minimum number of conflicts with other variables. This is called the min-conflicts heuristic.