TIME SERIES & MACHINE LEARNING - Horizon 2020 Ireland · Supervised vs Unsupervised learning...

Preview:

Citation preview

TIME SERIES & MACHINE LEARNING

PHD Luis Miralles

INDEX:

1.- What are Time Series and its relation with Machine Learning?

2.- Time series supervised learning: Human activity recognition

3.- Time series unsupervised learning: Similarity measures

4.- Time series clustering

2

1.- What are Time Series and its relation with

Machine Learning?

What are time series?

A series of values of a quantity obtained at successive times, often with equal

intervals between them.

4

What are Time Series?

5

Sampling techniques

Sampling is the process of transforming continuous data into discrete data. There

are basically two ways of sampling: one based on time (Riemann) and the other

based on the behaviour of the signals (Lebesgue).

6

Time series sampling techniques

- Riemann sampling: Captures samples depending on the time.

- Lebesgue sampling: Captures samples depending on the variation of the

output signal.

7

Riemann sampling

It is also known as Riemann sampling and it captures the information from the

continuous-signals at an equidistant time intervals (every second, every

minute,...). It is very simple to implement and that is why it has been used for

many years.

8

Lebesgue sampling

● It is more effective but it is also more difficult to implement.

● Advantages: Increasing the battery life of the sensors, reducing network traffic

by decreasing the amount of information transferred and using fewer computer

resources.

9

What is the relationship between Machine Learning

and Time Series?

10

What is Machine Learning?

● Arthur Samuel (1959). Machine Learning: Field of study that gives computers

the ability to learn without being explicitly scheduled.

● Semi-automated extraction of knowledge from data

● Extract knowledge and insight from data. The computer extracts some

information from data using algorithms.

11

Supervised vs Unsupervised learning

Machine Learning

Techniques

Supervised Learning

Unsupervised

Learning

Random Forest, Support Vector

Machine, Neural Networks, Naïve

Bayesian, K-nearest neighbors.

Hierarchical clustering, k-Means

clustering

12

Supervised learning

Supervised Learning

Classification

Regression

The classification algorithms predict a

class/label based on the inputs.

The regression algorithms predict a quantity

based on the inputs.

13

Unsupervised learning

Unsupervised

Learning

k-Means

Hierarchical clustering

14

Reinforcement learning

15

Cold start versus Warm start

16

Basic Machine Learning Steps

17

Step I:

Extract information

Step II:

Preprocessing: Clean

data, Missing values,

Feature selection

Step III:

Build model, Optimize

model

Step IV:

Implement data

Step III:

Plot results

Confusion matrix performance

18

CM Binary class CM Multi-class

Cross validation technique to optimise models

19

Machine Learning most used metrics

20

2.- Time series classification and Human

activity recognition

22

Time series Regression

Time series Regression

23

24

25

Time series classification

Overview of the HAR process

26

How to apply a time window to raw data

27

1 1000

2000

1

12

Methodology for HAR systems

28

Why Deep Learning is so famous?

29

IMU: Inertial measurement unit

An inertial measurement unit (IMU) is an electronic

device that measures and reports a body's specific

force, angular rate, and sometimes the magnetic field

surrounding the body, using a combination of

accelerometers and gyroscopes, sometimes also

magnetometers.

IMUs are typically used to manoeuvre aircraft,

including unmanned aerial vehicles (UAVs), among

many others, and spacecraft, including satellites and

landers.

30

Feature selection saves time and improves accuracy

31

Selecting features individually Selecting features by subsets

Filter vs Wrapper

PCA: Principal component analysis

32

33

Feature selection in HAR

Feature extraction from the window

34

Time

Frequency

Confusion matrix

35

HAR: Accelerometer, magnetometer and Gyroscope

▪ The Accelerometer measures total acceleration on the vehicle, including the

static acceleration from gravity it would experience even when its not moving.

▪ The magnetometer measures the magnetic field around the robot, including the

static magnetic field pointing approximately north caused by the earth.

▪ The Gyroscope measures your instantaneous angular momentum around each

axis, basically how fast its rotating.

36

Human activity recognition steps

37

Evaluation

38

Number of samples per activity

40

Number of samples per user

41

Values per axis for the walking activity

42

3.- Time series similarity measures

Euclidean distance between ts1 and ts3 is smaller than ED between ts2 and ts3.

Euclidean distance is not the best similarity measure

44

Similarity distance metrics I

● Euclidean distance

● Manhattan distance

● Minkowski distance

45

Similarity distance metrics II

•Correlation distance

• Cov(X,Y) stands for covariance of X and Y

• degree to which two different variables are related

• Var(X) stands for variance of X

• measurement of a sample differ from their mean

46

Similarity distance metrics III

• Variance

• Covariance

• Positive covariance

• two variables vary in the same way

• Negative covariance

• one variable might increase when the other decreases

• Covariance is only suitable for heterogeneous pairs 47

TS similarity measures ranking

Some papers focused on comparing the best dissimilarity

measures for time series. A remarkable one is that of Giusti,

R., & Batista, G. E. (2013) and its results is shown in Figure

1. Along this paper, all this measures are tested with a big

set of datasets.

In order to rank the best similarity measures 1-NN (Nearest

Neighbor) classification is used. 1-NN is a simple instance-

based classifier that depends heavily on the

similarity/dissimilarity measure employed. It is also

understood to be extremely competitive with more robust,

complex classification models.

Figure 1: Best Dissimilarity measures for Time series

49

TRAINING SET

TESTING SET

K-NN algorithm

How can we calculate how good a new TSM is?

Time series most used similarity measures

● Most used similarity measures are Euclidean, DTW, Pearson, Spearman, and Cosine.

Minkowski, Mahalanobis and Manhattan distance are also very well-known measures.

● Some other interesting methods can be LCSS (Longest Common SubSequence) and the

EDR techniques (Edit distance with Real Penalty and Edit Distance on Real sequences).

● EDR has been shown to be robust in the presence of noise, time shifts, and data scaling

Morse, M. D., & Patel, J. M. (2007, June).

● SAX is a novel algorithm which due to its outstanding performance can be interesting to

test. Lin, J. et al. (2007).

50

1.- Shape-based distances 2.- Feature-based distances

1.1.- Lock-step measures (Partial) Autocorrelation based

Lp distances Fourier Decomposition based

DISSIM TQuest

Short Time Series Distance (STS) Wavelet Decomposition based

Cross-correlation based (Integrated) Periodogram based

Pearson correlation based SAX representation based

CORT distance Spectral Density-based

1.2.- Elastic measures 3.- Structure-based distances

Frechet distance 3.1- Model-based

Dynamic Time Warping (DTW) Piccolo distance

Keogh_LB for DTW Maharaj distance

Edit Distance for Real Sequences (EDR) Cepstral based distances

Edit Distance with Real Penalty (ERP) 3.2.- Compression based

Longest Common Subsequence (LCSS) Compression based distances

Complexity invariant distance

Permutation distribution based distance

4.- Prediction based

Non-Parametric Forecast based

Time series similarity measures classification

51

Dynamic Time Warping

Sakoe, Hiroaki, and Seibi Chiba. "Dynamic programming algorithm optimization for spoken word recognition." IEEE

transactions on acoustics, speech, and signal processing 26.1 (1978): 43-49. (5000 citations)

Dynamic time warping (DTW) is a well-known technique to find an optimal alignment

between two given (time-dependent) sequences under certain restrictions

52

What is DTW versus Euclidean Distance?

53

Dynamic Time Warping

Advantages

● The DTW distance takes into account (part of) the local temporal

correlations.

● No system identification step is needed.

● Lower bounds on the distance are reasonably efficient.

● This measure allows to calculate distances between time series

of different lengths.

Disadvantages

● There is no clear link between this distance measure and the

generating system.

● The DTW distance as such is expensive to calculate.

● This measure does not take the input into account.

54

Symbolic Aggregate Approximation (SAX)

● SAX is the first symbolic representation for time series that allows for

dimensionality reduction and indexing with a lower-bounding distance measure.

In classic data mining tasks such as clustering, classification, index, etc.

● SAX is as good as well-known representations such as Discrete Wavelet

Transform (DWT) and Discrete Fourier Transform (DFT), while requiring less

storage space.

55

Symbolic Aggregate Approximation (SAX)

56

4.- Clustering approach

What is clustering?

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters).

58

• A clustering problem can be viewed as unsupervised classification.

• Clustering is appropriate when there is no a priori knowledge about the data.

• Finding the class labels and the number of classes directly from the data (in contrast to classification).

• More informally, finding natural groupings among objects.

When do we have to apply clustering?

59

Organizing data into classes such that there is:● High intra-class similarity● Low inter-class similarity

Intraclass and interclass similarity

60

Types of clustering:

61

School EmployeesSimpson's Family MalesFemales

Clustering is subjective

What is a natural grouping among these objects?

62

But at the same time... we can detect similarity.

63

Two Types of Clustering

Hierarchical

• Partitional algorithms: Construct various partitions and then evaluate them by some criterion (we will see an example called BIRCH)• Hierarchical algorithms: Create a hierarchical decomposition of the set of objects using some criterion.

Partitional

64

Hierarchical Clustering

• Produces a set of nested clusters organized as a hierarchical tree

• Can be visualized as a dendrogram, which is a tree-like diagram that records the sequences of merges or splits.

65

Hierarchical Clustering

66

Hierarchical clustering

• Advantages• They have good visualization

• Gives similarity distances between clusters

•Disadvantages• Not great performance

67

Partitional algorithms (K-means):

The objective of K-means is simple: group similar data points together and discover underlying patterns. To achieve this objective, K-means looks for a fixed number (k) of clusters in a dataset.

68

k-means

69

Algorithm k-means

1. Decide on a value for k.

2. Initialize the k cluster centers (randomly, if necessary).

3. Decide the class memberships of the N objects by assigning them to the nearest cluster center.

4. Re-estimate the k cluster centers, by assuming the memberships found above are correct.

5. If none of the N objects changed membership in the last iteration, exit. Otherwise goto 3.

70

Comments on the K-Means Method

Pros:• Relatively efficient• Often terminates at a local optimum.

Cons:• No applicable to categorical data• Need to specify the number of clusters• Unable to handle noisy data and outliers

71

72

Time series k-means

Time series dendogram

73

TIME SERIES & MACHINE LEARNING

PHD Luis Miralles

Recommended