12
Computer Science 2 CSCI 142 Backtracking Lecture 10/17/2015 1 Problem You are a cartographer. Your latest project is to develop a map for the “lower 48” states of the United States of America. Each state in the map should be made easily distinguishable from its neighbors through the use of distinct colors. However, the more colors you use, the more your costs go up. What is the minimum number of colors needed to ensure that each state can be colored a different color than any of its neighbors? 2 Analysis and Solution Design 2.1 Data Representation When one thinks of maps one often thinks of coordinates, latitude, and longitude. This in turn leads to a representation in terms of a grid, where states have coordinates. Unfortu- nately, this solution does not lend itself to the primary need of knowing which states are neighbors, since states come in all shapes and sizes, with differing numbers of neighbors. 1

Computer Science 2 CSCI 142 Backtracking Lecturecsci142/Lectures/12/backtracking.pdf · Computer Science 2 CSCI 142 Backtracking Lecture 10/17/2015 1 Problem You are a cartographer

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Computer Science 2 CSCI 142 Backtracking Lecturecsci142/Lectures/12/backtracking.pdf · Computer Science 2 CSCI 142 Backtracking Lecture 10/17/2015 1 Problem You are a cartographer

Computer Science 2 CSCI 142Backtracking Lecture

10/17/2015

1 Problem

You are a cartographer. Your latest project is to develop a map for the “lower 48” states ofthe United States of America. Each state in the map should be made easily distinguishablefrom its neighbors through the use of distinct colors. However, the more colors you use,the more your costs go up. What is the minimum number of colors needed to ensure thateach state can be colored a different color than any of its neighbors?

2 Analysis and Solution Design

2.1 Data Representation

When one thinks of maps one often thinks of coordinates, latitude, and longitude. This inturn leads to a representation in terms of a grid, where states have coordinates. Unfortu-nately, this solution does not lend itself to the primary need of knowing which states areneighbors, since states come in all shapes and sizes, with differing numbers of neighbors.

1

Page 2: Computer Science 2 CSCI 142 Backtracking Lecturecsci142/Lectures/12/backtracking.pdf · Computer Science 2 CSCI 142 Backtracking Lecture 10/17/2015 1 Problem You are a cartographer

Looking at the above figure’s “exploded” view, it is actually more natural to representthe situation with an adjacency list.

• The collection of states will be represented by a collection of nodes.• Each node will have a string name and (eventually) a color.• Each node object will also have a slot that holds a list of adjacent nodes.

Below you will see a graphic representation of the adjacency list for a small part of themap of the U.S.A. — the west coast.

2

Page 3: Computer Science 2 CSCI 142 Backtracking Lecturecsci142/Lectures/12/backtracking.pdf · Computer Science 2 CSCI 142 Backtracking Lecture 10/17/2015 1 Problem You are a cartographer

Again, this can be represented in code as a set of instances of a node class containing astate name (or other identifier) and a list of adjacent nodes. Since the colors of the stateswill be constantly changing, we will move those colorings into a separate data structurefor more efficient manipulation.

2.2 A Map Coloring Algorithm

Before the algorithm is described, a slight modification will be made. The new problemstatement is the following.

Can a map M be colored with no more than C colors?

If the more general question, what is the minimum number of colors needed, is required,one can simply repeat the algorithm with a smaller and smaller C until the answer is“no”. (Although there are better ways.)

The algorithm for coloring interconnected nodes, e.g., a map, can be viewed as a treesearch, but the “nodes” of that tree may not be what you expect. Each tree “node” isa possible coloring of the map. When we refer to the term map in this writeup, we aretalking about cartography, not the collection type. Assume that the root node representsa map with no states colored yet. A coloring c2 is a child of the coloring c1 if one of thestates that was uncolored in c1 has had its color assigned in c2. A complete coloring forthis problem is an association of 48 states to their respective colors. For that many statesand 5 colors (plus the uncolored state), there could be 648 or 2.2×1037 different colorings!

Hopefully we can avoid having to check all these colorings. In fact, we are not even going

3

Page 4: Computer Science 2 CSCI 142 Backtracking Lecturecsci142/Lectures/12/backtracking.pdf · Computer Science 2 CSCI 142 Backtracking Lecture 10/17/2015 1 Problem You are a cartographer

to build the tree data structure explicitly. Here is a very rough description of the approachthat will be followed.

Color all nodes as being uncolored (the unassigned color).Go to the first node.Assign a color, and make note of what other colors haven’t been tried yet.Go to the next node, and repeat the process....Eventually we will either discover a valid complete coloring or an invalid col-oring.If invalid, back up to an earlier node where there are still some choices leftand make a different choice. Proceed forward as before.

This approach, which must still be expressed in proper algorithmic form, is called back-tracking because we are allowed to undo some decisions to go back to a point where achoice was made, then make a different choice and proceed again.

Let’s add one more trick that will help us evaluate far fewer colorings. Even if some ofthe nodes are not yet colored, we should give up if two neighboring states are alreadycolored the same. This technique is called pruning. It comes from the metaphor of thesearch space being a tree. Abandoning certain paths appears like cutting branches fromthe tree.

Here is the pseudo-code for this algorithm. Recall that the coloring is separated out into itsown data structure. The map comes along for the ride, simply representing the unchangingconnections between the nodes (states).

define find_coloring(current_coloring , map)

observe the current_coloring of the map.

if the coloring is complete (no uncolored states)

return the current_coloring

otherwise

choose an as yet uncolored node.

generate new colorings based on that node getting

every possible color.

for each new coloring c

if the coloring is valid (no adjacent states have the same color)

solution = find_coloring(c, map)

if the solution indicates success

return the solution

# Repeating the loop body is the "backtrack"

# that will try the other colorings.

return Failure

We assume that, each time find_coloring is called, including the first time, the current-coloring is valid. If we assume find_coloring is first called with all nodes (states) uncol-ored, we will be fine.

4

Page 5: Computer Science 2 CSCI 142 Backtracking Lecturecsci142/Lectures/12/backtracking.pdf · Computer Science 2 CSCI 142 Backtracking Lecture 10/17/2015 1 Problem You are a cartographer

This algorithm does backtracking because of the loop. Imagine entering the loop forthe first time. You try a valid coloring c and call your function recursively. Maybe itis recursively called several more times. Finally, a point is reached where coloring fails.Eventually, these calls return back to yours, and you try the next valid coloring, whichmeans that the current node gets a color it has not gotten before. Since your activation,or instance, of the function remembers what has and has not been tried, it is realizing theinformal backtracking strategy mentioned previously.

2.3 A More General Algorithm

Can you detect a more general search strategy in the algorithm? It could be applied toother problems that, on the surface, appear to be quite different. (Perhaps your laboratoryassignment will be one of them!) In the pseudo-code below, the value null is returnedwhen the coloring tested fails.

2.4 Pseudocode

// solve: Config -> Config or null

function solve(configuration) // (i)

if isGoal(configuration) // (ii)

return configuration

else:

for child in successors(configuration) // (iii)

if isValid(child) // (iv)

solution = solve(child)

if solution != nullInI

return solution

return null

The parts of the above algorithm that need to be defined for the specific problem athand have been marked with Roman numeral comments. Specifically, it is necessary to(i) define the configuration structure, (ii) define what it means for a configuration to be agoal configuration, (iii) define how a list of successor configurations is generated, and (iv)define what it means for a configuration to be valid1. Additional arguments to the solve

function may or may not be needed depending whether or not additional solve featuresare desired (e.g. watching the search).

2.5 Complexity

The coloring problem for a general set of interconnected nodes is a well-known problem incomputer science. Below is a more precise mathematical description of it. We can describethe problem with a tuple of two sets, N , the set of nodes, and C, the set of connectionsbetween them.

1. Instead of using isValid in the definition of solve, it is possible to use it only in successors.This means the successors function can do the pruning internally

5

Page 6: Computer Science 2 CSCI 142 Backtracking Lecturecsci142/Lectures/12/backtracking.pdf · Computer Science 2 CSCI 142 Backtracking Lecture 10/17/2015 1 Problem You are a cartographer

For a rough analysis of our find_coloring, consider a less efficient version that doesnot perform pruning and instead checks the validity of a coloring only when a coloringis complete. In the worst case, the algorithm must check the validity of every possiblecomplete coloring. Since there are k ways to color each node and |N | nodes, there arek|N | complete colorings of the map. For each complete coloring, the validation checkmust examine the pair of adjacent nodes corresponding to each member of the set C.Therefore, complexity of this non-pruning version of find_coloring is O(|C|×k|N |). Thisis referred to as exponential time and is the slowest (because it is the fastest growing)Big-O classification.

There is no simple characterization of the complexity of the pruning version of find_coloring;the effectiveness of pruning depends upon the specific problem type and the order in whichnodes are colored. Nonetheless, the pruning version of find_coloring can be no worsethan the worst-case of the non-pruning version of find_coloring. In practice, a pruningbacktracking implementation is usually much better than O(|C| × k|N |).

2.6 Java Implementation

2.6.1 Design

6

Page 7: Computer Science 2 CSCI 142 Backtracking Lecturecsci142/Lectures/12/backtracking.pdf · Computer Science 2 CSCI 142 Backtracking Lecture 10/17/2015 1 Problem You are a cartographer

2.6.2 Source Code

Refer to the code and Java documentation that is posted under the course website forthis week. Our solution is composed of the following source code in the src subdirectory.

• MapColoring.java: The main program that reads in the the map data and buildsthe graph.

• Backtracker.java: The implementation of the general backtracking algorithm.• Configuration.java: An interface for the required methods that any configuration

used by the backtracker must implement.• ColoringConfig.java: Our specific implementation of a configuration in the map

coloring problem.• State.java: The representation of a single state in the map. This is used by the

main program to build the graph, and is used for reference in ColoringConfig.java.

2.6.3 Map Data File

The data for the map comes in several files in the src/data subdirectory.

• NEGraph.txt: Northeast states• NEngGraph.txt: New England states• OneGraph.txt: District of Columbia• USGraph.txt: all 48 states organized from most neighbors to least• USNo4CGraph.txt: all 48 states, but ”no four corners”. Utah is not connected to

New Mexico, and Arizona is not connected to Colorado. This makes the graphplanar (can be drawn on a plane with no edges intersecting).

• USRevGraph.txt: all 48 states organized from least neighbors to most• USSlowGraph.txt: Northwest states• WestGraph.txt: Western states• lecture.txt: An 8 state scenario you may see in lecture (using numbers instead of

state names)

Each comma separated line in a file contains the state name, followed by the state’sneighbors. For example, New York state:

New York,Connecticut,Massachusetts,New Jersey,Pennsylvania,Vermont

2.6.4 Graph Representation

The map will be represented as graph in an adjacency list whose vertices are the states,and whose edges are neighbors of the state.

Each state is represented as an instance of the State class. Each state has a name(String), and a collection of neighbors (Collection<State>).

The collection of states is stored in a Map<String, State>, where the key is the nameof the state (String), and the value is the state object (State).

7

Page 8: Computer Science 2 CSCI 142 Backtracking Lecturecsci142/Lectures/12/backtracking.pdf · Computer Science 2 CSCI 142 Backtracking Lecture 10/17/2015 1 Problem You are a cartographer

2.6.5 Backtracking Algorithm

The backtracking algorithm is contained in the Backtracker class. The key method issolve, which is the recursive implementation. It works with Configuration’s, which willbe mentioned next.

The solve method has been enhanced to work in a debug mode that will do verboseoutput for every call to this method. If enabled, on each invocation it will display thecurrent configuration, followed by its valid successors. When the goal is reached, that isalso displayed.

In order to handle the two outcomes of the solve method:

• A goal was found and the goal configuration is returned.• No solution was found and the caller needs to be notified of this

the return type of the solve method is Optional<Configuration>. If a solution wasfound, the goal configuration is wrapped into an Optional object, e.g.:

return Optional.of(config);

If no solution was found, the ”failure” case is returned by an ”empty” Optional object,e.g.:

return Optional.empty();

The main program calls the solver with the initial configuration (all states uncolored),and does the following with the result:

if (sol.isPresent()) {

System.out.println("Solution: " + sol.get());

} else {

System.out.println("No solution!");

}

2.6.6 Configuration Representation

In order for our backtracking algorithm to be able to work with any type of problem, wehave defined an interface, Configuration which all problems must implement. The threethings all configurations must be able to do are:

• public Collection<Configuration> getSuccessors(): For a configuration, re-turn a collection of all successor configurations.

• public boolean isValid(): A boolean function which indicates with the config-uration is still valid or not.

• public boolean isGoal(): A boolean function which indicates whether the config-uration is a goal, or not. This method has a pre-condition which is the configurationmust already be valid.

8

Page 9: Computer Science 2 CSCI 142 Backtracking Lecturecsci142/Lectures/12/backtracking.pdf · Computer Science 2 CSCI 142 Backtracking Lecture 10/17/2015 1 Problem You are a cartographer

2.6.7 Map Coloring Configuration Representation

Attributes

To represent a single possible coloring of a map (including colored and uncolored states),we implement this in ColoringConfig. This class ensures it can be used by the backtrackerby implementing the Configuration interface.

First, in order generate configurations, this class needs access to the graph that was builtin the main program. Recall it is a Map<String, State> of states.

The actual colorings are stored in another map, Map<State, String> coloring whosekey is the state (State), and whose value is the color (String).

To speed things up in the successor generation, we will also store the last state that wascolored, State current.

The number of colors that can be used is an input parameter to the main program. It ispassed into our constructor here. We store the valid colors as strings in an array:

private String[] palette;

The full range of colors that can be used is stored in a list. This is what the palette pullsfrom to get the colors that can be used for this coloring:

private final static List<String> COLORS = new ArrayList<String>(

Arrays.asList("RED", "GREEN", "BLUE", "YELLOW", "CYAN", "BLACK"));

Constructor

The constructor sets up the colorings by first initializing coloring so that all the statesare uncolored (e.g. using null for the String value).

To indicate nothing has been colored yet, it sets the current state to null.

It then populates the palette of available colors from the full range of colors.

Generating Successor Configurations

The goal here is to create and return a collection of configurations, Collection<Configuration.We are not worried about doing validity checking here. Therefore we will generate all pos-sible successors.

First, we have to go through the coloring’s and find the first state that has not beencolored (e.g. value will be null).

Now we create an initially empty list of configurations. We will generate as many config-urations as there are colors, for the current state.

If you recall the problem of ”shallow copying” from CS1, the problem still exists here. Wecan’t make a new ColoringConfig from a current one by doing:

ColoringConfig newConfig = this;

9

Page 10: Computer Science 2 CSCI 142 Backtracking Lecturecsci142/Lectures/12/backtracking.pdf · Computer Science 2 CSCI 142 Backtracking Lecture 10/17/2015 1 Problem You are a cartographer

This just makes another reference to an existing object. It is vitally important we makea ”deep copy” of the existing object. To do this, we will use a private helper method thatis commonly referred to as a ”copy constructor”, e.g.:

private ColoringConfig(ColoringConfig other) {...}

The rules for making a deep copy of an existing object are as follows. For each datamember:

• If the data member is a primitive, e.g. int, you can straight up use direct assignment.This is because primitives copy their value by default.

• If the data member is from the JCF, e.g. HashMap, those classes already implementa copy constructor, so you just need to call it.

• If the data member is a native array, you have to copy the elements over manually.Luckily, there is a method, System.arraycopy, that can be used.

Here is what the copy constructor looks like for us:

private ColoringConfig(ColoringConfig other) {

this.states = other.states;

this.current = other.current;

this.coloring = new HashMap<State, String>(other.coloring);

System.arraycopy(other.palette, 0, this.palette, 0, this.palette.length);

}

Putting it all together, in getSuccessors, we will create a new config using the copyconstructor, set the coloring, and then add this successor to the collection of successorsto return:

Collection<Configuration> successors = new LinkedList<Configuration>();

for (String color : this.palette) {

ColoringConfig successor = new ColoringConfig(this);

successor.coloring.put(this.current, color);

successors.add(successor);

}

Configuration Validation

The validity checking for this problem is very simple. If this is the first state to be colored(current is null), then the configuration is valid.

Otherwise we have to loop through the neighbors of current and compare their colors.If we find a neighbor with the same color, we know the configuration is valid. If we loopthrough all the neighbors and none of them match this state’s color, it is valid.

This works because recall that the backtracking algorithm builds up the colorings startingat the first state and moving forward. As long as we check the new state to be colored, wedon’t have to go back and check the previously colored states and their neighbor’s colors.

10

Page 11: Computer Science 2 CSCI 142 Backtracking Lecturecsci142/Lectures/12/backtracking.pdf · Computer Science 2 CSCI 142 Backtracking Lecture 10/17/2015 1 Problem You are a cartographer

Configuration Goal Checking

The goal checking is the simplest of all the methods. If there are no states that areuncolored, it is a goal. Recall, the precondition of this method is the goal is valid, becauseit gets called first in the backtracker.

So all we do is get all the values from the coloring’s, and if none of them are null, it isa goal. Otherwise it isn’t.

3 Testing

Because of the pruning that is part of the algorithm, there is a significant variation inexecution time depending on the order that the nodes are colored. The provided codecould take an hour or only a few seconds depending on this order. Preferably, states withmany neighbors are colored first. This allows us to quickly find the impossible situationswhere there are not enough colors for a set of adjacent states. This corresponds to pruningbranches from near the top of the search tree. If the ordering is reversed, things reallyslow down, as we are only able to prune branches from near the bottom of the search tree.

Other things that affect the running time are the following.

1. Representing state and color names as strings instead of integers: comparison ofstrings requires character-by-character comparison. If the average size of a string iss characters, much of the program slows down by a factor of s.

2. The size of the set of colors: This is a “Goldilocks” situation! If the set is too small,an impossible situation is found very early and the program quickly halts in failure.If too large, it is more likely that an early color choice for each state will just work,so there is less backtracking.

4 Further Reading

The coloring problem is one of a set of very famous problems calledNP-complete problemswhose known solutions are intractable (the number of cases to test grows exponentiallywith the input), but for which there is as yet no proof that it must take that much time.

Maps represent a very special interconnected node problem. The word used is planarbecause the nodes can be arranged so that none of the connections cross over each other,i.e., two dimensions suffice. If you look at a map of the US, you will see that it is planar2.It has been proved that all planar node networks can be colored using a maximum offour colors. The proof itself is interesting. It was the first to be proved by a computerprogram, because the number of cases to examine was deemed too large to be done by

2. However, if you accept that Utah, Arizona, New Mexico, and Colorado all touch each other at theFour Corners point, the network of states is no longer planar. Planar node networks result from definingneighboring states as sharing a boundary line, not a single point.

11

Page 12: Computer Science 2 CSCI 142 Backtracking Lecturecsci142/Lectures/12/backtracking.pdf · Computer Science 2 CSCI 142 Backtracking Lecture 10/17/2015 1 Problem You are a cartographer

hand3. Just because four colors suffice does not imply that the complexity of the algorithmis diminished; success and speed are two different things! But in this case it is true. Thecomplexity for the best algorithm that colors planar node networks with four colors runsin quadratic (O(n2)) time4.

3. See http://unhmagazine.unh.edu/sp02/mathpioneers.html

4. Robertson, N., Sanders, D. P., Seymour, P., and Thomas, R. 1996. Efficiently four-coloring planargraphs. 1996. DOI = http://doi.acm.org/10.1145/237814.238005

12