# Informed Search Methods

Unfortunately, uninformed search methods are very inefficient in most cases. We now show how heuristic search strategies - ones that use problem specific knowledge - can find solutions more efficiently.

## Best-first search

If we use the GENERAL-SEARCH algorithm of the previous chapter, the only place where knowledge can be applied is in the queuing function, which determines the node to expand next. The knowledge used to make this determination is provided by an evaluation function that returns a number that describes the desirability of expanding the node.
When the nodes are ordered so that the one with the best evaluation is expanded first, the resulting strategy is called best-first search. It can be implemented as: Note that the name best-first search is actually a misnomer because if it really always expanded the best node first, there would be no search at all.
What it, in fact, does is choose next the node that appears to be the best.
Just as there is a whole family of GENERAL-SEARCH algorithms with different ordering functions, there is a whole family of BEST-FIRST-SEARCH algorithms with different evaluation functions. These algorithms typically use some estimated measure of the cost of the solution and try to minimize it. One such measure is cost of the path so far. We have already used that one. In order to focus the search, the measure must incorporate some estimate of the cost of the path from a state to the goal.

Minimize estimated cost to reach a goal: Greedy search
The simplest best-first strategy is to minimize the estimated cost to reach the goal, i.e., always expand the node that appears to be closest to the goal. A function that calculates cost estimates is called an heuristic function.

• h(n) = estimated cost of the cheapest path from the state at n to a goal state
A best-first search that uses h to select the next node to expand is called a greedy search.

To get an idea of what an heuristic function looks like, lets look at a particular problem. Here we can use as h the straight-line distance to the goal. To do this, we need the map co-ordinates of each city. This heuristic works because roads tend to head in more or less of a straight line. This figure shows the progress of a greedy search to find a path from Arad to Bucharest. For this problem, greedy search leads to a minimal cost search because no node off the solution path is expanded. However, it does not find the optimal path: the path it found via Sibiu and Fagaras to Bucharest is 32 miles longer than the path through Pimnicu Vilcea and Pitesti.
Hence, the algorithm always chooses what looks locally best, rather than worrying about whether or not it will be best in the long run. (This is why its called greedy search.)
Greedy search is susceptible to false starts. Consider the problem of getting from Iasi to Fagaras. h suggests that Neamt be expanded first, but it is a deadend. The solution is to go first to Vaslui and then continue to Urziceni, Bucharest and Fagaras.
Note that if we are not careful to detect repeated states, the solution will never be found - the search will oscillate between Neamt and Iasi.
Greedy search resembles dfs in the way that it prefers to follow a single path to the goal and backup only when a deadend is encountered. It suffers from the same defects as dfs - it is not optimal and it is incomplete because it can start down an infinite path and never try other possibilities.
The worst-case complexity for greedy search is O(bm), where m is the maximum depth of the search. Its space complexity is the same as its time complexity, but the worst case can be substantially reduced with a good heuristic function.

Minimizing the total path cost: A* search
Recall that uniform search minimizes the cost of the path so far, g(n): it is optimal and complete, but can be very inefficient. We can combine the use of g(n) and h(n) simply by summing them

• f(n)=g(n)+h(n)
f(n) = the estimated cost of the cheapest solution through n

Hence, if we are trying to find the cheapest solution, a reasonable thing to try is the node with the lowest value for f. It is possible to show that this strategy is complete and optimal, given a simple restriction on the h function: h must never overestimate the cost to reach the goal. Such an h is called an admissible heuristic.

If h is admissible, f(n) never overestimates the actual cost of the best solution through n.

Best-first search using f and an admissible h is known as A* search.

Notice that hsld (straight-line distance) is an addmissible heuristic because the shortest path between two points is a straight line. The following diagram shows the first few steps of the A* search for Bucharest using hsld. Notice that A* prefers to expand from Rimnicu Vilcea rather than Fagaras. Even though Fagaras is closer to Bucharest, the path taken to get to Fagaras is not as efficient in getting close to Bucharest as the path taken to get to Rimnicu.

The behaviour of A* search
An important thing to notice about the example A* search above: along any path from the root, the f-cost never decreases. This fact holds true for almost all admissible heuristics. An heuristic with this property is said to exhibit monotonicity.

Monotonicity is not quite the same as admissible. In a few cases, heuristics are admissible, but not monotonic.

When our heuristics are not monotonic, we will make them so, so we can assume that if an heuristics is admissible it is monotonic.

Now, we can take a look at what is going on in A*. If f never decreases along any path out of the root, we can conceptually draw contours in the state space. Because A* expands the leaf node of lowest f, an A* search fans out from the start node, adding nodes in concentric bands of increasing f-cost.
With uniform-cost search (A* using h=0), the bands will be "circular" around the start node. WIth more accurate heuristics, the bands will stretch towards the goal state and become narrowly focused around the optimal path. If we define f* to the the cost of the optimal solution path, we can say:

• A* expands all nodes with f(n)<f*
• A* may then expand some of the nodes right on the "goal contour", for which f(n)=f* before selecting the goal node.
Hence, the first solution found must be the optimal one because nodes in all subsequent contours will have higher f-cost and hence higher g-cost.
A* search is also complete because as we add bands of increasing f, we must eventually reach a band where f is equal to the cost of the path to a goal state.

A* is optimally efficient for any given admissible heuristic function. This is because any algorithm that does not expand all nodes in the contours between the root and the goal runs the risk of missing the optimal solution.

Complexity of A*
The catch with A* is that even though its complete, optimal and optimally efficient, it still can't always be used, because for most problems, the number of nodes within the goal contour search space is still exponential in the length of the solution.

Similarly to breadth-first search, however, the major difficulty with A* is the amount of space that it uses.

## Heuristic Functions

First, we will look at some different heuristics for the 8-puzzle problem. A typical solution to the puzzle has around 20 steps. The branching factor is about 3. Hence, an exhaustive search to depth 20 will look at about 320 = 3.5 x 109 states. By keeping track of repeated states, this number can be cut down to 9!=362,880 different arrangements of 9 squares. We need a heuristic to further reduce this number.

If we want to find shortest solutions, we need a function that never overestimates the number of steps to the goal. Here are two possibilities:

• h1= the number of tiles that are in the wrong position. This is admissible because any tile that is out of place must be moved at least once.
• h2= the sum of the distances of the tiles from their goal positions. Since tiles cannot be moved diagonally, we use city block distance. h2 is also admissible. Why?
The effect of heuristic accuracy on performance
One way to characterize the quality of an heuristic is the effective branching factor b*. If the total number of nodes expanded by A* for a particular problem is N and the solution depth is d, then b* is the branching factor that a uniform tree of depth d would have to have in order to contain N nodes.
• N=1+b*+(b*)2+...+(b*)d
Example if A* finds a solution at depth 5 using 52 nodes, the effective branching factor is 1.91

Usually, the effective branching factor of an heuristic is fairly constant over problem instances and, hence, experimental measurements of b* on a small set of problems provides a good approximation of the heuristic's usefulness.

A well designed heuristic should have a value of b* that is close to 1. The following is a table of the performance of A* with h1 and h2 above on 100 randomly generated problems. It shows that h2 is better than h1 and that both are much better than iterative deepening search. Note that h2 is always a better heuristic function than h1.

Inventing heuristic functions
Is is possible for heuristic functions to be invented mechanically?

In many cases, the answer is yes. In the 8-puzzle case, h1 and h2 can be considered perfectly accurate path lengths for simplified versions of the 8-puzzle problem. That is, if the rules of the puzzle were changed so that a tile could be moved anywhere instead of just into an opening, h1 would accurately give the number of steps in the shortest solution.
Is there a simplified problem in which h2 gives correct path lengths?
Yes. If a tile could move a square in any direction, even onto an occupied square.
A problem with less restrictions on its operators is called a relaxed problem. It is often the case that the cost of an exact solution to a relaxed problem is a good heuristic for the original problem.
If a problem definition is written down in a formal language, it is possible to automatically generate relaxations. Again in 8-puzzle, the operators can be described as:

• A tile can move from square A to square B is A is adjacent to B and B is blank.
one can generate three relaxed problems by removing conditions:
• A tile can move from square A to square B if A is adjacent to B.
• A tile can move from square A to square B if B is blank.
• A tile can move from square A to square B.
One problem with generating new heuristic functions is there is often no one best heuristic.

It turns out that if there are several choices for heuristic functions with none clearly dominating the others, and they are all admissible, then they can be combined as

• h(n)=max(h1(n),h2(n),...,hm(n))
This composite heuristic uses whichever function is most accurate on a node.

Often it is possible to pick out features of a state that contribute to its heuristic evaluation, even if it is hard to say what the contribution should be. For example, the goal in chess is to checkmate and the relevant features include number of pieces of each kind belonging to each side, the number of pieces that are attacked by opponent's pieces, etc.

Usually, the evaluation function is assumed to be a linear combination of the features and even if we don't know the weights, it is possible to use a learning algorithm to acquire them.

We have not considered the cost of computing the heuristic functions in our discussion. So long as its about the cost of expanding a node, this is not an issue, but when the cost is as high as expanding several hundred nodes, it can no longer be ignored.

Heuristics for constraint satisfaction
We have so far examined variants of depth first search for solving CSPs. Now we extend the analysis by considering heuristics for selecting a variable to instantiate and for choosing a value for the variable.

Suppose we are trying to colour the following map with three colours so that no two adjacent regions have the same colour. Clearly the best next move is to colour region E first because the only possible colour for it is blue. All the other regions have a choice and we might make the wrong one requiring us to backtrack.

This idea is called the most-constrained-variable heuristic. It is used with forward checking. At each point in the search, the variable with the fewest possible values is chosen to have a value assigned next. In this way the branching factor in the search tends to be minimized. When this heuristic is used in solving the n-queens problems, the maximum value of n goes from 30 for forward checking to around 100.

The most-constraining-variable heuristic attempts to reduce the branching factor on future choices by assigning a value to the variable that is involved in the largest number of constraints on other unassigned variables.

Once a variable has been selected, a value must still be chosen for it. For example, suppose we decide to assign a value to region C after A and B. Red is a better choice than blue because it leaves more freedom for future choices. This is the least-constraining-value heuristic - choose a value that rules out the smallest number of values in variables connected to the current variable by constraints. When applied to the n-queens problem, it allows problems up to n=1000 to be solved.

## Memory Bounded Search

IDA* is a logical extension of iterative deepening search and SMA* is similar to A* but restricts the queue size to fit into the available memory.
Iterative deepening A* search (IDA*) Uses the same trick we used to make breadth-first search memory efficient.
In this algorithm each iteration is dfs, just as in regular iterative deepening. However, the search is modified to use an f-cost limit rather than a depth limit.
Hence, each iteration expands all nodes inside the contour for the current f-cost, peeping over the contour to find out where the next contour lies.
Once the search inside a contour is completed, a new iteration is started using a new f-cost for the next contour.

IDA* is complete and optimal, but because it is depth-first it only requires space proportional to the longest path that it explores. b*d is a good estimate of storage requirements.

The algorithm's time complexity depends on the number of different values that the h function can take on. The city block distance function used in the 8-puzzle takes on one of a small number of integer values. So typically, f only increases two or three times along any solution path. Hence, IDA* only goes through 2 or 3 iterations. Because IDA* does not need to maintain a queue of nodes, its treatment of nodes is faster.

Unfortunately, IDA* has difficulty in more complex domains. In the travelling salesperson problem, for example, the heuristic value is different for every state. As a result, the search is much more like iterative deepening: if A* expands N nodes, IDA* will have to go through N iterations and will therefore expand 1+2+...+N=O(N2) nodes.

One way around this problem is to increase the f-cost limit by a fixed amount [e] on each iteration, so that the total number of iterations is proportional to 1/[e]. This can reduce the search cost at the expense of returning solutions that can be worse than optimal by at most [e]. Such an algorithm is called [e]-admissible.

SMA* search

IDA*s difficulties in some problem spaces can be traced to using too little memory. Between iterations, it retains only a single number, the current f-cost limit.

SMA*(Simplified memory-bounded A*) can make use of all available memory to carry out the search. SMA* has the following properties:

• It will utilize whatever memory is made available.
• It avoids repeated states as far as its memory allows.
• It is complete if the available memory is sufficient to store the shallowest solution path.
• It is optimal if enough memory is available to store the shallowest optimal solution path. Otherwise it returns the best solution that can be reached with the available memory.
• When enough memory is available for the entire search tree, the search is optimally efficient.

When SMA* needs to make space on its queue, it drops a node from the queue. It prefers to drop nodes that are unpromising - that is nodes with high f-cost. To avoid reexploring subtrees that it has dropped from memory, it retains in the ancestor nodes information about the quality of the best path in the forgotten subtree. In this way, it only regenerates the subtree when all other paths have been shown to look worse than the path it has forgotten. Example  Here is a hand simulation of SMA* on illustrates how it works. • 1. At each stage, one successor is added to the deepest lowest-f-cost node that has some successors not currently in the tree. The left child B is added to the root A.
• 2. Now f(A) is still 12, so we add the right child G (f=13). Now that we have seen all the children of A, we can update its f-cost to the minimum of its children, that is, 13. The memory is now full.
• 3. G is now designated for expansion, but we must first drop a node to make room. We drop the shallowest highest-f-cost leaf, that is B. When we have done this, we note that A's best forgotten descendent has f=15, as shown in parentheses. We then add H, with f(H)=18. H is not a goal node, but the path to H uses up all the available memory. Hence, there is no way to find a solution through H, so we set f(H)=*.
• 4. G is expanded again. We drop H and add I with f(I)=24. Now we have seen both successors of G, with values * and 24, so f(G) becomes 24. f(A) becomes 15, the minimum of 15 (forgotten successor value) and 24. Notice that I is a goal node, but it might not be the best solution because A's f-cost is only 15.
• 5. A is once again the most promising node, so B is generated for a second time. We have found the path through G was not so great after all.
• 6. C, the successor of B, is a nongoal node at the max depth, so f(C)=*.
• 7. To look at the second successor, D, we first drop C. Then f(D)=20, and this calue is inherited by B and A.
• 8. Now the deepest, lowest-f-cost node is D. D is therefore selected and because it is a goal node, the search terminates.
In this case, there is enough memory for the shallowest optimal solution path. If J had had a cost of 19 instead of 24, however, SMA* still would not have been able to find it because the solution path contains four nodes.

In this case, it would return D which is a reachable, non-optimal solution.

Given a reasonable amount of memory, SMA* can solve significantly more difficult problems that A* without incurring significant overhead in terms of extra nodes generated. It performs well on problems with highly connected state spaces with real-valued heuristics, on which IDA* has difficulty.

## Iterative Improvement Algorithms

Start with a complete configuration and make modifications to improve quality. Example: n-queens

For this kind of algorithm, consider a grid containing all of the problem states. The height of any point corresponds to the value of the evaluation function on the state at that point.

The idea of iterative improvement algorithms is to move around the grid, trying to find the highest peaks, which are optimal solutions.

These algorithms usually keep track only of the current state and do not look ahead beyond immediate neighbors.

There are two major classes of iterative improvement algorithms: Hill-climbing algorithms try to make changes that improve the current state. Simulated annealing algorithms sometimes make changes that make things worse. Hill-Climbing Search

Always move in a direction of increasing quality. This simple policy has three well-known drawbacks:

• Local Maxima - a local maximum is a peak that is lower than the highest peak in the state space. Once on a local maximum, hill-climbing will halt, even though there is a better solution.
• Plateaux - a plateaux is an area of the state space where the evaluation function is nearly flat. Hill-climbing will do a random walk in such an area.
• Ridges - a ridge can have steeply sloping sides, so that the search reaches the top with ease, but on the top may slope gently toward a peak. Unless there happen to be operators that move directly along the top of the ridge, the search may oscillate from side to side making little or no progress. Simulated Annealing

Allow hill-climbing to take some downhill steps to escape local maxima.

The innermost loop of simulated annealing is quite similar to hill-climbing. Instead of picking the best move, however, it picks a random move. If the move actually improves the situation, it is executed. Otherwise, the algorithm make the move with some probability less than 1. The probability decrease exponentially with the badness of the move - the amount [[Delta]]E by which the evaluation is worsened.

A second parameter T is also used to determine the probability. At higher values of T, bad moves are more likely to be allowed. As T tends towards 0, they become more and more unlikely, until the algorithms behaves like hill-climbing. The schedule input determines the value of T as a function of how many cycles already have been completed.

Simulated annealing was developed from an explicit analogy with annealing - the process of gradually cooling a liquid until it freezes. The value function corresponds to the total energy of the atoms in the material and T corresponds to the temperature. The schedule determines the rate at which the temperature is lowered. Individual moves in the state space correspond to random fluctuations due to thermal noise. One can prove that if the temperature is lowered sufficiently slowly, the material will attain a lowest-energy configuration. This corresponds to the statement that if schedule lowers T slowly enough, the algorithm will find a global optimum. Applications to constraint satisfaction problems

CSPs can be solved by iterative improvement methods by first assigning values to all the variables and then applying modification operators to move the configuration towards a solution. These operators simply assign a different value to a variable, e.g., 8-queens.

Algorithms that solve CSPs in the fashion are called heuristic repair methods. In choosing a new value for a variable, the most obvious heuristic is to select the value that results in the minimum number of conflicts with other variables. This is called the min-conflicts heuristic. 