41
Lab 2

Lab 2. Exploring the Data with Graphs During data mining, it is often useful to explore the data by creating visual summaries. Clementine offers several

  • View
    215

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Lab 2. Exploring the Data with Graphs During data mining, it is often useful to explore the data by creating visual summaries. Clementine offers several

Lab 2

Page 2: Lab 2. Exploring the Data with Graphs During data mining, it is often useful to explore the data by creating visual summaries. Clementine offers several

Exploring the Data with Graphs

• During data mining, it is often useful to explore the data by creating visual summaries. Clementine offers several different types of graphs to choose from, depending on the kind of data that you want to summarise.

Page 3: Lab 2. Exploring the Data with Graphs During data mining, it is often useful to explore the data by creating visual summaries. Clementine offers several

• For example, to find out what proportion of the patients responded to each drug, use a Distribution node

Page 4: Lab 2. Exploring the Data with Graphs During data mining, it is often useful to explore the data by creating visual summaries. Clementine offers several
Page 5: Lab 2. Exploring the Data with Graphs During data mining, it is often useful to explore the data by creating visual summaries. Clementine offers several

• Place a Distribution node in the workspace and connect it to the Source node (don't forget to use your middle mouse button). Then double-click the Distribution node to open its dialog box and set the options for display

Page 6: Lab 2. Exploring the Data with Graphs During data mining, it is often useful to explore the data by creating visual summaries. Clementine offers several
Page 7: Lab 2. Exploring the Data with Graphs During data mining, it is often useful to explore the data by creating visual summaries. Clementine offers several

• Select Drug as the target field whose distribution you want to show. Then, click Execute from the dialog box

Page 8: Lab 2. Exploring the Data with Graphs During data mining, it is often useful to explore the data by creating visual summaries. Clementine offers several
Page 9: Lab 2. Exploring the Data with Graphs During data mining, it is often useful to explore the data by creating visual summaries. Clementine offers several

• The distribution graph helps you see the "shape" of the data. It shows that patients responded to drug Y most often and to drugs B and C least often

Page 10: Lab 2. Exploring the Data with Graphs During data mining, it is often useful to explore the data by creating visual summaries. Clementine offers several
Page 11: Lab 2. Exploring the Data with Graphs During data mining, it is often useful to explore the data by creating visual summaries. Clementine offers several

Exploring the Data with Graphs

• Now let's look more closely at what factors might influence Drug, the target variable. As a researcher, you know that the concentrations of sodium and potassium in the blood are important factors. So let's create another graph, this time looking at how the Na and K values influence the choice of drug.

Page 12: Lab 2. Exploring the Data with Graphs During data mining, it is often useful to explore the data by creating visual summaries. Clementine offers several

• Since these are both numeric values, you can create a scatterplot of sodium versus potassium, using the drug categories as a color overlay.

Page 13: Lab 2. Exploring the Data with Graphs During data mining, it is often useful to explore the data by creating visual summaries. Clementine offers several

• Place a Plot node in the workspace and connect it to the Source node. (Remember to drag with your middle mouse button.) Then, double-click the Plot node to open its dialog box.

Page 14: Lab 2. Exploring the Data with Graphs During data mining, it is often useful to explore the data by creating visual summaries. Clementine offers several
Page 15: Lab 2. Exploring the Data with Graphs During data mining, it is often useful to explore the data by creating visual summaries. Clementine offers several

• Select K as the X field, Na as the Y field, and Drug as the overlay field. Then, click Execute.

Page 16: Lab 2. Exploring the Data with Graphs During data mining, it is often useful to explore the data by creating visual summaries. Clementine offers several
Page 17: Lab 2. Exploring the Data with Graphs During data mining, it is often useful to explore the data by creating visual summaries. Clementine offers several

• Note: You can also create the plot by clicking the Execute button in the dialog box.

Page 18: Lab 2. Exploring the Data with Graphs During data mining, it is often useful to explore the data by creating visual summaries. Clementine offers several

• The plot clearly shows a threshold above which the correct drug is always drug Y and below which the correct drug is never drug Y. This threshold is a ratio--the ratio of sodium (Na) to potassium (K).

Page 19: Lab 2. Exploring the Data with Graphs During data mining, it is often useful to explore the data by creating visual summaries. Clementine offers several
Page 20: Lab 2. Exploring the Data with Graphs During data mining, it is often useful to explore the data by creating visual summaries. Clementine offers several

• So far, you have been exploring the data using graphs. Next, we'll move on to data preparation where we'll perform a common data mining operation--deriving a new field.

Page 21: Lab 2. Exploring the Data with Graphs During data mining, it is often useful to explore the data by creating visual summaries. Clementine offers several

• Before moving on, you may want to clean up the workspace. Delete the two Graph nodes and the Table node. To delete a node, right-click on it and choose Delete from the context menu. Or, select multiple nodes with your mouse and press the Delete key.

Page 22: Lab 2. Exploring the Data with Graphs During data mining, it is often useful to explore the data by creating visual summaries. Clementine offers several
Page 23: Lab 2. Exploring the Data with Graphs During data mining, it is often useful to explore the data by creating visual summaries. Clementine offers several

• Since the ratio of sodium to potassium seems to predict when to use drug Y, you should derive a field that contains the value of this ratio for each record. This field might be useful later when you build a model to predict when to use each of the five drugs.

Page 24: Lab 2. Exploring the Data with Graphs During data mining, it is often useful to explore the data by creating visual summaries. Clementine offers several
Page 25: Lab 2. Exploring the Data with Graphs During data mining, it is often useful to explore the data by creating visual summaries. Clementine offers several

• To derive a new field, start by inserting a Derive node into the stream.

Page 26: Lab 2. Exploring the Data with Graphs During data mining, it is often useful to explore the data by creating visual summaries. Clementine offers several
Page 27: Lab 2. Exploring the Data with Graphs During data mining, it is often useful to explore the data by creating visual summaries. Clementine offers several

• Remember, you can automatically connect nodes by first selecting the Source node in the canvas and then double-clicking the Derive node from the palettes.

Page 28: Lab 2. Exploring the Data with Graphs During data mining, it is often useful to explore the data by creating visual summaries. Clementine offers several

• Then, double-click the Derive node to open its dialog box and specify a method for creating the new field.

Page 29: Lab 2. Exploring the Data with Graphs During data mining, it is often useful to explore the data by creating visual summaries. Clementine offers several
Page 30: Lab 2. Exploring the Data with Graphs During data mining, it is often useful to explore the data by creating visual summaries. Clementine offers several

• Name the new field Na_to_K. Since you obtain the new field by dividing the sodium value by the potassium value, enter Na/K for the formula. You can also create a formula by clicking the icon just to the right of the field

Page 31: Lab 2. Exploring the Data with Graphs During data mining, it is often useful to explore the data by creating visual summaries. Clementine offers several
Page 32: Lab 2. Exploring the Data with Graphs During data mining, it is often useful to explore the data by creating visual summaries. Clementine offers several

• This opens the Expression Builder, a way to interactively create expressions using built-in lists of functions, operands, and fields and their values.

Page 33: Lab 2. Exploring the Data with Graphs During data mining, it is often useful to explore the data by creating visual summaries. Clementine offers several
Page 34: Lab 2. Exploring the Data with Graphs During data mining, it is often useful to explore the data by creating visual summaries. Clementine offers several

• Using the Expression Builder is covered in-depth later in this guide. Click here to jump ahead now.

Page 35: Lab 2. Exploring the Data with Graphs During data mining, it is often useful to explore the data by creating visual summaries. Clementine offers several

• You can check the distribution of your new field by attaching a Histogram node to the Derive node. In the Histogram node dialog box, specify Na_to_K as the field to be plotted and Drug as the overlay field.

Page 36: Lab 2. Exploring the Data with Graphs During data mining, it is often useful to explore the data by creating visual summaries. Clementine offers several
Page 37: Lab 2. Exploring the Data with Graphs During data mining, it is often useful to explore the data by creating visual summaries. Clementine offers several

• When you execute the stream, you should get the graph shown here. Based on the display, you can conclude that when the Na_to_K value is about 15 or above, drug Y is the drug of choice.

Page 38: Lab 2. Exploring the Data with Graphs During data mining, it is often useful to explore the data by creating visual summaries. Clementine offers several
Page 39: Lab 2. Exploring the Data with Graphs During data mining, it is often useful to explore the data by creating visual summaries. Clementine offers several

• So far, by exploring and manipulating the data, you have been able to form some hypotheses. The ratio of sodium to potassium in the blood seems to affect the choice of drug. But you cannot fully explain all of the relationships yet.

Page 40: Lab 2. Exploring the Data with Graphs During data mining, it is often useful to explore the data by creating visual summaries. Clementine offers several

• This is where modeling will likely provide some answers.

Page 41: Lab 2. Exploring the Data with Graphs During data mining, it is often useful to explore the data by creating visual summaries. Clementine offers several