27
Temporal Data Mining Claudio Bettini, X.Sean Wang and Sushil Jajodia Presented by Zhuang Liu

Temporal Data Mining Claudio Bettini, X.Sean Wang and Sushil Jajodia Presented by Zhuang Liu

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Temporal Data Mining Claudio Bettini, X.Sean Wang and Sushil Jajodia Presented by Zhuang Liu

Temporal Data Mining

Claudio Bettini, X.Sean Wang and Sushil Jajodia

Presented by Zhuang Liu

Page 2: Temporal Data Mining Claudio Bettini, X.Sean Wang and Sushil Jajodia Presented by Zhuang Liu

Outline

What is Data Mining?Formal Problem DefinitionTAG (Timed Automaton with Granularity)A Naive SolutionTechniques for Improving PerformanceExperimental Results

Page 3: Temporal Data Mining Claudio Bettini, X.Sean Wang and Sushil Jajodia Presented by Zhuang Liu

What is Data Mining

Data Mining

A non-trivial extraction of implicit, previously unknown & potentially useful information from data

Common Data Mining Techniques association-rule mining

Sequential mining (Temporal mining)

Clustering

Classification Outlier detection

Page 4: Temporal Data Mining Claudio Bettini, X.Sean Wang and Sushil Jajodia Presented by Zhuang Liu

Temporal Data Mining

Finding time-related frequent patterns (frequent sub-sequences)

which pairs of events occur frequently one week after another

A simple example: user may be interested in finding all those events that frequently follow within 2 business days of a rise of the IBM stock price.

Page 5: Temporal Data Mining Claudio Bettini, X.Sean Wang and Sushil Jajodia Presented by Zhuang Liu

Definition

Event Type (E): e.g. deposit to an account e.g. price increase of a specific stock

Event e: An event e is a pair e=(E, t), where E is an event type

and t is a positive integer, called the timestamp of e .

Event Sequence An Event Sequence a finite set of events.

Each event (E, t) appearing in an event sequence represents the occurrence of event type E at time t.

Page 6: Temporal Data Mining Claudio Bettini, X.Sean Wang and Sushil Jajodia Presented by Zhuang Liu

Granularity

Granularity is a mappingμfrom the set of the positive integers to subset of the time domain such that for all positive integers i and j with i<j:

(1) implies that each number in

i is less than all the numbers in j, and

(2) implies .

Example: year, month, week, day, business-day, business-week etc.

00 jμiμ

0iμ 0jμ

Page 7: Temporal Data Mining Claudio Bettini, X.Sean Wang and Sushil Jajodia Presented by Zhuang Liu

TCG

A temporal constraint with granularity (TCG) [m,n] is a binary relation on positive integers. For positive integers t1 and t2, (t1, t2) satisfies [m,n] iff

(1) t1 t2

(2) and are both defined, and

(3)

Example: TCG[0,0]day, [0,2]hour, [1,1]month

1t

2t

nttm 12

Page 8: Temporal Data Mining Claudio Bettini, X.Sean Wang and Sushil Jajodia Presented by Zhuang Liu

Event Structure

An event structure (with granularities) is a rooted directed acyclic graph (W,A,Γ), where W is a finite set of event variables, A W W andΓ is a mapping from A to the finite set of TCGs.

Complex event type derived from S

each variable associated with a specific event type.

Complex event matching S

each variable associated with a distinct event such that the event timestamps satisfy the time constraints.

Page 9: Temporal Data Mining Claudio Bettini, X.Sean Wang and Sushil Jajodia Presented by Zhuang Liu

Example of Event Structure

Assign the event types for x0 , x1, x2, x3, to be IBM-rise, IBM-earnings-report, HP-rise, and IBM-fall, respectively, we have a complex event type. This complex event type describes that the IBM earnings were reported one business day after the IBM stock rose, and in the same or the next week the IBM stock fell; while the HP stock rose within 5 business days after the same rise of the IBM stock and within 8 hours before the same fall of the IBM stock.

[1,1]b-day

0x

1x

2x3x

[0,5]b-day [0,8]hours

Figure 1: An event structure

[0,1]week

Page 10: Temporal Data Mining Claudio Bettini, X.Sean Wang and Sushil Jajodia Presented by Zhuang Liu

Formal Problem Definition

An event-mining problem is a quadruple (S, , E0 , ),

where S is an event structure, is the minimum confidence value, E0 an event type, and is a partial mapping which assigns a set of event types to some of the variables (expect root).

An event-mining problem is the problem of finding all complex event types such that each occurs frequently in the input sequence and is derived from S by assigning E to the root and a specific event type to each of the other variables.

Example (S, 0.8, IBM-rise, )

Page 11: Temporal Data Mining Claudio Bettini, X.Sean Wang and Sushil Jajodia Presented by Zhuang Liu

TAG

Timed Automaton with Granularities A basic component to test if a candidate complex event

type appears frequent in a time sequence. A timed automaton with granularities is a 6-tuple ,

S, S0, C, T, F), where (1) is a finite set of input letters, (2) S is a finite set of states,

(3) S0 S is a set of start states, (4) C is a finite set of clocks, (5) T S S 2C (C) is a set of transitions, (6) F S is a set of accepting states.

Page 12: Temporal Data Mining Claudio Bettini, X.Sean Wang and Sushil Jajodia Presented by Zhuang Liu

TAG

(C) is the set of all the formulas called clock constraints.

A transition (s, s’, e, , ) represents a transition from state s to state s’ on input symbol e. the set C gives the clocks to be reset with this transition. And is a clock constraint over C.

Is essentially standard finite automata with some modifications.

Each TAG maintains a set of clocks. Both input symbol and clock determine the next state. A run is an accepting run if the last state is in the set F.

An event sequence is accepted by a TAG if there exists an accepting run.

Page 13: Temporal Data Mining Claudio Bettini, X.Sean Wang and Sushil Jajodia Presented by Zhuang Liu

A Naïve Solution Consider all the event types that occur in the given event

sequence, and consider all the complex types derived from the given event structure, one from each assignment of these event types to the variables. Each of these complex types is called a candidate complex type for the event-mining problem.

For each candidate complex type, start the corresponding TAG at every occurrence of E0. That is, for each occurrence of E0 in the event structure, use the rest of the event sequence as the input to one copy of the TAG. By counting the number of TAGs reaching a final state, versus the number of occurrences of E0 , all the solutions of the event-mining problem will be derived.

The number of candidate types is exponential in the number of event types occurring in the event structure. Too costly.

Page 14: Temporal Data Mining Claudio Bettini, X.Sean Wang and Sushil Jajodia Presented by Zhuang Liu

Techniques to improve performance

The performance of this algorithm can be improved by:

identifying the possible inconsistencies in the given event structure before starting the process,

reducing the length of the sequence,

reducing the number of times an automaton has to be started,

reducing the number of different automata to be started,

applying the naïve algorithm.

Page 15: Temporal Data Mining Claudio Bettini, X.Sean Wang and Sushil Jajodia Presented by Zhuang Liu

Recognition of Inconsistent Event Structures

A event structure is consistent if there exists a complex event that matches that event structure.

If an event structure is inconsistent, it should be discarded even before the mining process starts.

It is difficult to determine the consistency of event structures.

Use approximated polynomial algorithms to check the consistency of event structures.

Page 16: Temporal Data Mining Claudio Bettini, X.Sean Wang and Sushil Jajodia Presented by Zhuang Liu

Recognition of Inconsistent Event Structures

If one of the constraints implied by the given ones is the “empty” one, i.e. unsatisfiable, the whole event structure is inconsistent.

A TCG [m’, n’] is logically implied by a TCG [m, n] if each pair (x, y) satisfying the second constraint, satisfies also the first one.

For example, a TCG [1,2]b-week can be converted into [3,18]day or [0,1]month, while it cannot be converted into [2,3]week-end or [1,3]week, since the resulting constraints are not implied by [1,2]b-week.

Page 17: Temporal Data Mining Claudio Bettini, X.Sean Wang and Sushil Jajodia Presented by Zhuang Liu

Reduction of the Event Sequence

We can reduce the event sequence by

exploiting the granularities.

For example, if a discovery problem is defined on the sub-structure excluding variable x3, the input event sequence can be reduced discarding any event that does not occur in a business day.

Page 18: Temporal Data Mining Claudio Bettini, X.Sean Wang and Sushil Jajodia Presented by Zhuang Liu

Reduction of the occurrences of the root

The basic idea is to remove those occurrences of reference types which cannot be the root of a complex event matching the given structure.

It is possible that for some occurrences of the reference types in the sequence, a constraint is unsatisfiable.

Consider all the non-empty sets of explicit and implicit constraints on the pair of the root and each non-root node. Check if one of the constraints cannot be satisfied.

For example, if no event occurs in the sequence in the next business day of an IBM-rise event, this particular reference event can be discarded. (No automaton is started for it.)

Page 19: Temporal Data Mining Claudio Bettini, X.Sean Wang and Sushil Jajodia Presented by Zhuang Liu

Reduction of the occurrences of the root

Let N be the number of occurrences of the reference event type in the sequence.

Let N’ be the number of occurrences of reference events for which one of the constraints is unsatisfiable. These are reference events that are certainly not the root of a complex event satisfying the given event structure.

If N’/N ≤1-, there cannot be any frequent complex event type and the empty set should be returned to the user.

Otherwise, remove these occurrences of the reference type and modify into ’= ( *N) / (N- N’)

.

Page 20: Temporal Data Mining Claudio Bettini, X.Sean Wang and Sushil Jajodia Presented by Zhuang Liu

Reduction of the Candidate Type

Based on the property: if a complex event type occurs frequently, then any of its sub-type should also occur frequently.

In other words, if one assignment to two variables is not frequent, any candidate complex event type including this assignment won’t be frequent. So we can remove these complex event type from the candidate complex event type.

For each subset W’ of W, the induced approximated sub-structure of W’ is (W’, A’, Γ’), where A’ consists of all pairs (X, Y) W’ W’, such that there is a path from X to Y in S and there is at least one constraint on (X,Y).

Page 21: Temporal Data Mining Claudio Bettini, X.Sean Wang and Sushil Jajodia Presented by Zhuang Liu

Reduction of the Candidate Type

To find the solutions to the induced discovery problems is rather straightforward and simple in time complexity. Indeed, the induced sub-structure gives the distance from the root to the variable (in effect, two distances, namely the minimum distance and the maximum distance).

For each occurrence of E0 , this distance translates into a window, i.e., a period of time during which the event for X must appear.

Extend the sub-structure to more than one non-root variable. These variable form a chain in S.

Page 22: Temporal Data Mining Claudio Bettini, X.Sean Wang and Sushil Jajodia Presented by Zhuang Liu

Experimental Results

Closing prices of 439 stocks for 517 trading days Price changes are partitioned into 7 categories: (- , -5%), (-

5%, -3%), (-3%, 0), (0, 0), (0, 3%), (3%, 5%), (5%, ) Total number of event types is 2978. The number of event is

181089. The reference event type X0: the drop of IBM stock of less

than 3%. Minimum confidence value is 0.7. There is no other assignment to other variables.

[0,2]b-day [1,2]b-day [0,0]b-weekX0 X1 X2 X3

The event structure used in the experiment

Page 23: Temporal Data Mining Claudio Bettini, X.Sean Wang and Sushil Jajodia Presented by Zhuang Liu

Experimental Results cont.

Page 24: Temporal Data Mining Claudio Bettini, X.Sean Wang and Sushil Jajodia Presented by Zhuang Liu

Experimental Results cont.

This experiment focuses on Step 4, namely reduction of the candidate complex event types by using sub-structures.

The result shows that after using heuristics the number of candidate complex event types reduces significantly.

Page 25: Temporal Data Mining Claudio Bettini, X.Sean Wang and Sushil Jajodia Presented by Zhuang Liu

Experimental Results cont.

The two frequent event combinations discovered in the experiment

Page 26: Temporal Data Mining Claudio Bettini, X.Sean Wang and Sushil Jajodia Presented by Zhuang Liu

References C. Bettini, Wang, X.S., Jajodia, S. and Jia-Ling, L. "Discovering

Temporal Relationships with Multiple Granularities in Time Sequences". IEEE Transations on Knowledge and Data Engineering, Vol. 10 (2), 1998.

C. Bettini, X. Wang, and S. Jajodia. A General Framework for Time Granularity and its Application to Temporal Reasoning. Annals of Mathematics and Artificial Intelligence, Vol. 22 (1-2), pages 29-58, Baltzer Science Publishers, 1998.

C. Bettini, X. S. Wang, and S. Jajodia. Testing complex temporal relationships involving multiple granularities and its application to data mining. In Proceedings of the Fifteenth ACM SIGACT-SIGMODSIGART Symposium on Principles of Database Systems (PODS'96), pages 68-78, Montreal, Canada, June 1996

C. Bettini, X. Sean Wang, and S. Jajodia. Mining temporal relationships with multiple granularities in time sequences. Data Engineering Bulletin, 21:32--38, 1998.

Page 27: Temporal Data Mining Claudio Bettini, X.Sean Wang and Sushil Jajodia Presented by Zhuang Liu

Thank you

Question?