1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci Exercises

1

Learning Agents LaboratoryComputer Science Department

George Mason University

Prof. Gheorghe Tecuci

ExercisesExercises

2

OverviewOverview

Sample questions on version space learning

Sample questions on decision tree learning

Some general questions and exercises

Your exercises

Sample questions on other learning strategies

3

Version SpacesVersion Spaces

The version space for a set of examples given incrementally (for which there is a concept covering the positive examples and not covering the negative examples) will decrease (i.e. will contains strictly fewer concepts) when:

1. Always when a negative example is given

2. Always when a positive example is given

3. Always when a positive example is not covered by any concept from the lower bound

4. Always when a negative example is covered by all the concepts from the upper bound

Mihai Boicu

Select the correct answers and justify your solution:

4

Given:• The axioms of plane Euclidian geometry• Several problem solving examples consisting of

geometry problems with their axiomatic solutionsQuestions:1. What will Explanation-Based Learning generate from

one of the examples?2. Are the learned theorems useful and generally

applicable?3. How could one learn useful theorems from these

examples?

Explanation-based learningExplanation-based learning

Cristina Boicu

5

Decision-tree learningDecision-tree learning

Bogdan Stanescu

1) Give an example of a training set on which ID3 does not generate the smallest possible decision tree. Show the result of applying ID3 and also show a smaller tree.

Hint: The information gain of an attribute is 0 if the ratio pi/(pi+ni) is the same for all i; otherwise the information gain is strictly positive.

2) How would you extend the ID3 algorithm to learn from examples belonging to more than two classes?Which is the formula for computing the information gain of an attribute.

6


Gabriel Balan

Give a counter example to the heuristic used by the ID3 algorithm for picking the attributes.

7


Yan Sun

Training examples for the target concept PlayTennis:

A decision tree for the concept PlayTennis:

(continues)

8

Answer the following questions true or false, and explain the answer:

1. Is it possible to get ID3 to further elaborate the tree below the rightmost leaf (and make no other changes to the tree), by adding a single new correct training example to the original fourteen examples?

2. Is it possible to get ID3 to learn an incorrect tree (i.e., a tree that is not equivalent to the target concept) by adding new correct training examples to the original fourteen ones?

3. Is it possible to produce some set of correct training examples that will get ID3 to include the attribute Temperature in the learned tree, even though the true target concept is independent of Temperature?

9

Suppose we want to classify whether a given balloon is inflated based on four attributes: color, size, the “act” of the person holding the balloon, and the age of the person holding the balloon. Show the decision tree that ID3 would build to learn this classification. Display the information gain for each candidate attribute at the root of the tree.

Color Size Act Age Inflated?

Yellow Small Stretch Adult F

Yellow Small Stretch Child T

Yellow Small Dip Adult T

Yellow Small Dip Child T

Yellow Small Dip Child F

Yellow Large Stretch Adult T

Yellow Large Stretch Child T

Yellow Large Dip Adult T

Yellow Large Dip Child F

Yellow Large Dip Child F

Purple Small Stretch Adult T

Purple Small Stretch Child T

Purple Small Dip Adult T

Purple Small Dip Child F

Purple Small Dip Child F

Purple Large Stretch Adult T

Purple Large Stretch Child T

Purple Large Dip Adult T

Purple Large Dip Child F

Purple Large Dip Child F

Discussion: In this problem, there are situations where the information gain for each attribute is the same, we cannot decide which attribute to choose. Are there any methods for these situations ?

Xianjun Hao

10

Imagine the following attributes related to weather: a. "wind degree" - windy, calmb. "sun degree" - sunny, cloudyc. "rain degree" - raining, not-raining There are 2^3 = 8 possible "weathers" described by these attributes. Ascribe + or - to each of the combination in such a way, that in every decision tree the depth of each branch must be equal to number of attributes (3). How many of such trees exist?

How many such trees exist for n attributes? Why?

Zbigniew Skolicki

11

Consider the following data:

Height (inches)

Hair Color Eye Color Class

61 Brown Brown 1

63 Brown Brown 1

69 Brown Brown 1

74 Brown Brown 1

67 Brown Blue 0

63 Blonde Blue 0

71 Blonde Brown 0

73 Blonde Blue 0

Just looking at the table, what concept do you think defines class 1?

Use the ID3 algorithm taught in class to build a decision tree.(Helpful hints: The entropy of a set whose members all have the same value for the attribute in question is 0. The entropy of a set which has exactly equal numbers of each value for the attribute in question is 1.)

Charles Day(continues)

12

Write out the concept represented by this tree.

Does this rule match your intuitive sense of the concept represented by the data?

Are you happy with the concept learned using the decision tree? Why? Do you think this decision tree would do well in classifying other instances of the concept represented by the data?

What can you say about attributes with a lot of values?

Another method for choosing attributes to split a node uses gain ratio. Gain ratio is defined as:

Gain/Split Informationwhere the term Split Information is defined as

13

In ID3, when an attribute has continuous values, one approach for handling the attribute is to categorize the value into discrete set of bins. Sometimes, an attribute may have a large set of finite discrete value that may not render itself to discrete set of bins. For example, an attribute like retail store name. Each example may have a different value for the attribute. How should a decision tree algorithm deal with such a situation?

Decision tree has often been applied in data mining applications. A marketing company may use consumer data to target a specific group of people earning certain amount of income or higher. Below is a set of attributes and associated possible values. What attributes should be used to create a decision tree that will predict a person’s salary being above $50K? Remember there are some attributes containing continuous values and some containing a larges set of nominal values.

Simon Liu(continues)

14

age: continuous. workclass: Private, Self-emp-not-inc, Self-emp-inc, Federal- gov Local-gov, State-gov, Without-pay, Never worked. fnlwgt: continuous. education: Bachelors, Some-college, 11th, HS-grad, Prof- school, Assoc-acdm, Assoc-voc, 9th, 7th-8th, 12th, Masters, 1st-4th, 10th, Doctorate, 5th-6th, Preschool. education-num: continuous. marital-status: Married-civ-spouse, Divorced, Never-married, Separated, Widowed, Married-spouse-absent, Married-AF-spouse. occupation: Tech-support, Craft-repair, Other-service, Sales, Exec-managerial, Prof-specialty, Handlers- cleaners, Machine-op-inspct, Adm-clerical, Farming-fishing, Transport-moving, Priv-house- serv, Protective-serv, Armed-Forces. relationship: Wife, Own-child, Husband, Not-in-family, Other- relative, Unmarried. race: White, Asian-Pac-Islander, Amer-Indian-Eskimo, Other, Black. sex: Female, Male. capital-gain: continuous. capital-loss: continuous. hours-per-week: continuous. native-country: United-States, Cambodia, England, Puerto-Rico, Canada, Germany, Outlying-US(Guam-USVI-etc), India, Japan, Greece, South, China, Cuba, Iran, Honduras, Philippines, Italy, Poland, Jamaica, Vietnam, Mexico, Portugal, Ireland, France, Dominican-Republic, Laos, Ecuador, Taiwan, Haiti, Columbia, Hungary, Guatemala, Nicaragua, Scotland, Thailand, Yugoslavia, El-Salvador, Trinadad&Tobago, Peru, Hong, Holand-Netherlands. class: >50K, <=50K

15

OverviewOverview




Your exercises


16

What is an instance?

What is a concept?

What is a positive example of a concept?

What is a negative example of a concept?

Give an intuitive definition of generalization.

What does it mean for concept A to be more general than concept B?

Indicate a simple way to prove that a concept is not more general than another concept.

Given two concepts C1 and C2, from a generalization point of view, what are all the different possible relations between them?

QuestionsQuestions

17

What is a generalization rule?

What is a specialization rule?

What is a reformulation rule?

Name all the generalization rules you know.

Briefly describe and illustrate with an example the “turning constants into variables” generalization rule.

Define and illustrate the dropping conditions generalization rule.

18

QuestionsQuestions

Indicate various generalizations of the following sentence:“A student who has lived in Fairfax for 3 years.”

What could be said about the predictions of a cautious learner?

What could be said about the predictions of an aggressive learner?

How could one synergistically integrate a cautious learner with an aggressive learner to take advantage of their qualities to compensate for each other’s weaknesses?

19

QuestionsQuestions

What is the learning bias?Which are the different types of bias?

20

Consider the background knowledge represented by the following generalization hierarchies and theorem:

any-color

warm-color cold-color

red yelloworange blackblue green

any-shape

polygone round

triangle rectangle

square

circle ellipse

xy(ON x y) => (NEAR x y)]

Show that E1 is more general than E2:

E1 = (COLOR x warm-color) & (SHAPE x round) & (COLOR y red) & (SHAPE y polygon) & (NEAR x y)

E2 = (COLOR u yellow) & (SHAPE u circle) & (COLOR v red) &(SHAPE v triangle) & (ON u v) & (ISA u toy) & (ISA v toy)

ExerciseExercise

21

Consider the background knowledge represented by the following generalization hierarchies and theorem:

any-color

warm-color cold-color

red yelloworange blackblue green

any-shape

polygone round

triangle rectangle

square

circle ellipse

xy(ON x y) => (NEAR x y)]

Consider also the following concept:E = (COLOR u yellow) & (SHAPE u circle) & (COLOR v red) &(SHAPE v triangle) & (ON u v) & (ISA u toy) & (ISA v toy) & (HEIGHT u 5)

Indicate six different generalization rules. For each such rule determine an expression Eg which is more general than E according that that rule.

22

Consider the following two concepts:

C 1: ?X IS SCREW

HEAD HEXAGONAL COST 5

C 2: ?X IS NUT

COST 6

Indicate different generalization of them.

23

Define the following:•a generalization of two concepts•a minimally general generalization of two concepts•the least general generalization of two concepts•the maximally general specialization of two concepts.

24

?X IS LOUDSPEAKER-COMPONENTMADE-OF ?M

?M IS MATERIAL

?Z IS ADHESIVEGLUES ?M

G1: ?X IS LOUDSPEAKER-COMPONENTMADE-OF ?M

?M IS MATERIAL

?Z IS INFLAMMABLE-OBJECTGLUES ?M

G2:

LOUDSPEAKER-COMPONENT

MEMBRANE CHASSIS-ASSEMBLY BOLT

ADHESIVE INFLAMMABLE-OBJECTTOXIC-SUBSTANCE

SCOTCH-TAPE SUPER-GLUE MOWICOLL CONTACT-ADHESIVE

MATERIAL

CAOUTCHOUC PAPER METAL

IS ISIS IS IS

IS

IS

IS

ISISIS

ISISIS

Consider the following concepts:

and the following generalization hierarchies:

Indicate four specializations of G1 and G2 (including two maximally general specializations).

25

OverviewOverview




Your exercises


26

What happens if there are not enough examples for S and G to become identical?

Could we still learn something useful?

How could we classify a new instance?

When could we be sure that the classification is the same as the one made if the concept were completely learned?

Could we be sure that the classification is correct?

Version Space questionsVersion Space questions

27

Could the examples contain errors?

What kind of errors could be found in an example?

What will be the result of the learning algorithm if there are errors in examples?

What could we do if we know that there is at most one example wrong?

Version Space questionsVersion Space questions

28

OverviewOverview




Your exercises


29

QuestionsQuestions

What induction hypothesis is made in decision tree learning?

What are some reasons for transforming a decision tree into a set if rules?

How to change the ID3 algorithm to deal with noise in the examples?

What is overfitting and how could it be avoided?

Compare tree pruning with rule post pruning.

How could one use continuous attributes with decision tree learning?

How to deal with missing attribute values?

30

QuestionsQuestions

Compare the candidate elimination algorithm with the decision tree algorithm, from the point of view of the generalization language, the bias, the search strategy and the use of the examples.

What problems are appropriate for decision tree learning?

Which are the main features of decision tree learning?

31

OverviewOverview




Your exercises


32

QuestionsQuestions

Questions are in the lecture notes corresponding to each learning strategy.

Documents

1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci Exercises