53
1 Data Mining : Commercial Applications 趙趙趙 趙趙趙趙趙 趙趙趙趙趙趙趙 2002/10/28

1 Data Mining : Commercial Applications 趙民德 中央研究院 統計科學研究所 2002/10/28

  • View
    241

  • Download
    3

Embed Size (px)

Citation preview

Page 1: 1 Data Mining : Commercial Applications 趙民德 中央研究院 統計科學研究所 2002/10/28

1

Data Mining : Commercial Applications

趙民德中央研究院統計科學研究所

2002/10/28

Page 2: 1 Data Mining : Commercial Applications 趙民德 中央研究院 統計科學研究所 2002/10/28

2

• DM good data analysis

• KDD DM with commercial objective in mind

Page 3: 1 Data Mining : Commercial Applications 趙民德 中央研究院 統計科學研究所 2002/10/28

3

• Data mining for maximum value is difficult unless a structured plan is followed.

• Knowledge Discovery process to get the most out of data mining.

Page 4: 1 Data Mining : Commercial Applications 趙民德 中央研究院 統計科學研究所 2002/10/28

4

• Knowledge Discovery and Data Mining:

• The Expectation of Magic

(Dorian Pyle , PC AI magazine, Sept/Oct 1998 )

Page 5: 1 Data Mining : Commercial Applications 趙民德 中央研究院 統計科學研究所 2002/10/28

5

• Business managers seem to expect magic from applying data mining tools to their data.

Page 6: 1 Data Mining : Commercial Applications 趙民德 中央研究院 統計科學研究所 2002/10/28

6

This key to appropriate use of data mining lies in a structured methodology to

• Find problems,

• Define solutions,

• Set expectations, and

• Deliver results

Page 7: 1 Data Mining : Commercial Applications 趙民德 中央研究院 統計科學研究所 2002/10/28

7

This process is called Knowledge Discovery.

Page 8: 1 Data Mining : Commercial Applications 趙民德 中央研究院 統計科學研究所 2002/10/28

8

10 guiding principles

• Select clearly defined problems that will yield tangible benefits

• Specify the solution required

• Specify how the solution delivered will be used

Page 9: 1 Data Mining : Commercial Applications 趙民德 中央研究院 統計科學研究所 2002/10/28

9

• Understand as much as possible about the problem and the data set (the domain)

• Let the problem drive the modeling (i.e., tool selection, data preparation, etc.)

• Stipulate assumptions

• Iteratively refine the model

• Make the model as simple as possible, but no simpler

Page 10: 1 Data Mining : Commercial Applications 趙民德 中央研究院 統計科學研究所 2002/10/28

10

• Define instability in the model: areas where change in output is drastically different for small change in output

• Define uncertainty in the model: critical areas and ranges in the data set where the model produces low confidence prediction or insights.

Page 11: 1 Data Mining : Commercial Applications 趙民德 中央研究院 統計科學研究所 2002/10/28

11

Mining the Data (three parts)

• Preparing the data

• Surveying the data

• Modeling the data.

Page 12: 1 Data Mining : Commercial Applications 趙民德 中央研究院 統計科學研究所 2002/10/28

12

• Briefly, problem exploration involves the discovery of appropriate problems using interviewing and problem elicitation techniques.

• Decision support tools, including pair-wise rankings and ambiguity resolution, help build a problem matrix.

• The problems are ranked for the benefit each will return based on various factors of importance to the problem owner.

Page 13: 1 Data Mining : Commercial Applications 趙民德 中央研究院 統計科學研究所 2002/10/28

13

Solution exploration finds the most effective solutions for each problem:

• ranking alternatives if necessary.

Page 14: 1 Data Mining : Commercial Applications 趙民德 中央研究院 統計科學研究所 2002/10/28

14

Implementation addresses such issues as:

what is to be delivered,

who will use the solution,

how it will be used,

what training is required to use it,

how long it will remain effective,

how to monitor continued effectiveness.

Page 15: 1 Data Mining : Commercial Applications 趙民德 中央研究院 統計科學研究所 2002/10/28

15

• Data preparation takes at least 60% of the project’s time.

• Implementation specification is key to the project’s success

Page 16: 1 Data Mining : Commercial Applications 趙民德 中央研究院 統計科學研究所 2002/10/28

16

• Projects that were very successful technically can fail because the results were never implemented in practice.

• Without the will, resources, and commitment to put the solution in place, Knowledge Discovery will yield no return at all!

Page 17: 1 Data Mining : Commercial Applications 趙民德 中央研究院 統計科學研究所 2002/10/28

17

• Let me give you my data, tell me what you find. Familiar words?

• This is the expectation of magic.

Page 18: 1 Data Mining : Commercial Applications 趙民德 中央研究院 統計科學研究所 2002/10/28

18

The outcome of a data mining project consists of a model which does one of two things: The model will be

• Explanatory, or

• Predictive.

Page 19: 1 Data Mining : Commercial Applications 趙民德 中央研究院 統計科學研究所 2002/10/28

19

Inferential models explain the relationships that exist in data. They may indicate

• the driving factors for stock market movements, or

• show failure factors in printed circuit board production.

Regardless of purpose, these models help explain relationships.

Page 20: 1 Data Mining : Commercial Applications 趙民德 中央研究院 統計科學研究所 2002/10/28

20

• Predictive models may or may not explain relationships. Primarily, they make predictions of output conditions given a set of input conditions

Page 21: 1 Data Mining : Commercial Applications 趙民德 中央研究院 統計科學研究所 2002/10/28

21

• In many direct mail solicitation campaigns, the marketing manager did not ask what factor motivated people to respond to the solicitation.

• Instead, the focus of the model was simply to increase response.

• If it worked reliably and robustly, fine. If not, it was of no value.

Page 22: 1 Data Mining : Commercial Applications 趙民德 中央研究院 統計科學研究所 2002/10/28

22

• Whether explanatory or predictive, the data mining model must provide actionable information. This is critical to the project’s success.

Page 23: 1 Data Mining : Commercial Applications 趙民德 中央研究院 統計科學研究所 2002/10/28

23

• The purpose of the project is to provide information that will allow better decision-making.

• Therefore, data mining is a tool in the decision support arsenal, a formidably potent tool when properly used.

Page 24: 1 Data Mining : Commercial Applications 趙民德 中央研究院 統計科學研究所 2002/10/28

24

• Knowledge Discovery, as a process, makes sure the goals of mining data align with the user’s needs. The results will directly and unambiguously bear on the domain of the decision to be made.

Page 25: 1 Data Mining : Commercial Applications 趙民德 中央研究院 統計科學研究所 2002/10/28

25

• Knowledge Discovery aligns the objectives of the modeler with the problem domain to search for optimal return for the effort invested.

Page 26: 1 Data Mining : Commercial Applications 趙民德 中央研究院 統計科學研究所 2002/10/28

26

• Instead of “let me give you my data”, Knowledge Discovery leads to “let’s discuss the problem and see what can be done”. No magic here. This is a structured search of alternatives and options.

Page 27: 1 Data Mining : Commercial Applications 趙民德 中央研究院 統計科學研究所 2002/10/28

27

• Each stage requires a commitment from separate groups of people inside a business or organization. At each stage, various parties work through the issues, making choices at each point, and fully understand the issues and expectations.

Page 28: 1 Data Mining : Commercial Applications 趙民德 中央研究院 統計科學研究所 2002/10/28

28

Example 1

• A Fortune 500 pharmaceutical and bio-chemical company heard of data mining and wanted to explore what it could do for them.

Page 29: 1 Data Mining : Commercial Applications 趙民德 中央研究院 統計科學研究所 2002/10/28

29

• Some of the managers read of the wonderful things that data mining could do by just looking at their data.

• Rather than accept copious amounts of data, the benefits of Knowledge Discovery is explained.

Page 30: 1 Data Mining : Commercial Applications 趙民德 中央研究院 統計科學研究所 2002/10/28

30

• The initial exploration, which included Problem and Solution Exploration, took two weeks.

• When completed, more than 250 problems were clearly defined for areas including personnel, manufacturing, inventory control, and testing.

Page 31: 1 Data Mining : Commercial Applications 趙民德 中央研究院 統計科學研究所 2002/10/28

31

• Managers in each department worked through defining appropriate problems and defining solutions.

• Their involvement was crucial to finding appropriate problems, the solutions to which would yield real business value.

Page 32: 1 Data Mining : Commercial Applications 趙民德 中央研究院 統計科學研究所 2002/10/28

32

• Senior managers and bio-chemists were presented with the results, an analysis that defined where the resources were located and which projects were to proceed. This lead to the Implementation phase.

Page 33: 1 Data Mining : Commercial Applications 趙民德 中央研究院 統計科學研究所 2002/10/28

33

• Note the level of involvement of the key actors. Because they worked through the problem, understood realistically what might be done with each problem, and evaluated the issue of implementing the solution, this project was a success.

Page 34: 1 Data Mining : Commercial Applications 趙民德 中央研究院 統計科學研究所 2002/10/28

34

Example 2

• A major telecommunications company’s marketing department wanted data mining to solve their churn problem.

Page 35: 1 Data Mining : Commercial Applications 趙民德 中央研究院 統計科學研究所 2002/10/28

35

• When presented with the Knowledge Discovery approach, they dismissed it as irrelevant in this case. Their problem was churn: well defined, well understood.

• Build a model to predict churn, and all would be well.

Page 36: 1 Data Mining : Commercial Applications 趙民德 中央研究院 統計科學研究所 2002/10/28

36

• The data was dirty and polluted, but with the help of advanced data preparation techniques, a reliable and robust model was constructed which was 83% accurate at predicting churn customers.

Page 37: 1 Data Mining : Commercial Applications 趙民德 中央研究院 統計科學研究所 2002/10/28

37

• The best previous techniques had achieved about 59% accuracy. The model provided a 40% improvement in predictive power.

Page 38: 1 Data Mining : Commercial Applications 趙民德 中央研究院 統計科學研究所 2002/10/28

38

• Marketing then spent a six-figure sum attempting to avert churn -- to no avail!

Page 39: 1 Data Mining : Commercial Applications 趙民德 中央研究院 統計科學研究所 2002/10/28

39

• Predicting churn was not the problem. The problem with churn, perhaps, would have been better addressed by building a demographic or sociographic model of the causes of churn, and address those causes.

• That, however, did not occur.

Page 40: 1 Data Mining : Commercial Applications 趙民德 中央研究院 統計科學研究所 2002/10/28

40

• they were persuaded to try again using the Knowledge Discovery process. It turned out that for this company the most valuable feature was “Customer Lifetime Value”. To identify and focus on the motivating factors promoting this feature yielded significant benefit.

Page 41: 1 Data Mining : Commercial Applications 趙民德 中央研究院 統計科學研究所 2002/10/28

41

• Solving the right problem is more important than simply building a good model.

• The Knowledge Discovery process does exactly that.

Page 42: 1 Data Mining : Commercial Applications 趙民德 中央研究院 統計科學研究所 2002/10/28

42

Three Components of DM

• Data Preparation

• Data Surveying

• The Data Model

Page 43: 1 Data Mining : Commercial Applications 趙民德 中央研究院 統計科學研究所 2002/10/28

43

• Data Preparation is the most important part of mining.

• Sometimes the data is available in a data warehouse. This is helpful, but not sufficient.

• Data preparation for data mining is a different activity than preparing data for warehousing.

• CRISP

Page 44: 1 Data Mining : Commercial Applications 趙民德 中央研究院 統計科學研究所 2002/10/28

44

• Data mining requires fixing the problems of missing and empty variables, monotonic variables, categorical ordering, and many other problems not dealt with in data warehousing.

Page 45: 1 Data Mining : Commercial Applications 趙民德 中央研究院 統計科學研究所 2002/10/28

45

• In one extreme example, data from a warehouse not prepared for mining was modeled and produced a model that was 6% effective at predicting the required feature. This data had many problems, but after suitable preparation a reliable and robust model that was nearly 60% effective was produced.

Page 46: 1 Data Mining : Commercial Applications 趙民德 中央研究院 統計科學研究所 2002/10/28

46

• Data Surveying involves a look at the shape of the whole data set, by building a map of the territory before expending the time and effort required to create models. The survey addresses the question

“Is the answer in here anyway?”

Page 47: 1 Data Mining : Commercial Applications 趙民德 中央研究院 統計科學研究所 2002/10/28

47

• The Data Model is the small-scale map of some very particular part of the territory. The nature of the data and the purpose of the model will determine which tools are appropriate.

• Building the model is the piece that is typically thought of as data mining --- the application of automated tools to data.

Page 48: 1 Data Mining : Commercial Applications 趙民德 中央研究院 統計科學研究所 2002/10/28

48

• While important, building the model is just a piece of the whole Knowledge Discovery process.

Page 49: 1 Data Mining : Commercial Applications 趙民德 中央研究院 統計科學研究所 2002/10/28

49

• Data mining, the practice of applying automated pattern detection software tools to data, is not carried out in isolation from the rest of the world.

• A commercial data mining project will not be successful if it is not driven by business needs. To discover and fulfill appropriate business problems, define solutions to those problems, use appropriate data, and build useful models requires an integrated process.

Page 50: 1 Data Mining : Commercial Applications 趙民德 中央研究院 統計科學研究所 2002/10/28

50

• The Knowledge Discovery process provides the necessary framework to ensure a successful outcome, if one is possible.

Page 51: 1 Data Mining : Commercial Applications 趙民德 中央研究院 統計科學研究所 2002/10/28

51

• It is a structured, multi-step process. After completing each stage, results are evaluated to determine the most fruitful next step. This iterative procedure requires the commitment and involvement of many people. This ensures everyone involved understand the process, and that they carefully evaluate the cost and potential benefits.

Page 52: 1 Data Mining : Commercial Applications 趙民德 中央研究院 統計科學研究所 2002/10/28

52

• Commitment to proceed requires understanding the value of expected results.

• At all stages appropriate expectations are set, and the process is viewed as part of decision-making and policy guidance.

Page 53: 1 Data Mining : Commercial Applications 趙民德 中央研究院 統計科學研究所 2002/10/28

53

• The committed involvement and understanding of managers seeking measurable results removes the expectation of magic from data mining.