Eindhoven University of Technology MASTER Extracting ...Extracting Features to Discriminate OSA and non-OSA Master Thesis Wenjie Pei Department of Mathematics and Computer Science

Eindhoven University of Technology

MASTER

Extracting features to discriminate OSA and non-OSA

Pei, W.

Award date:2013

Link to publication

DisclaimerThis document contains a student thesis (bachelor's or master's), as authored by a student at Eindhoven University of Technology. Studenttheses are made available in the TU/e repository upon obtaining the required degree. The grade received is not published on the documentas presented in the repository. The required complexity or quality of research of student theses may vary by program, and the requiredminimum study period may vary in duration.

General rightsCopyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright ownersand it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.

• Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain

https://research.tue.nl/en/studentthesis/extracting-features-to-discriminate-osa-and-nonosa(af20b6fe-02c0-4e79-87d9-ab2d231245aa).html

Extracting Features toDiscriminate OSA and

non-OSA

Master Thesis

Wenjie Pei

Department of Mathematics and Computer ScienceDatabases and Hypermedia Research Group

Supervisors:Dr. Toon Calders

Senior Scientist Stefan WinterDrs. Thanh Lam Hoang

Eindhoven, July 2013

Abstract

This project is about analyzing anatomical data obtained by pharyngometry with respect toobstructive sleep apnea (OSA) and the goal is to extract valued feature to discriminate OSA andnon-OSA subjects.

The key contributions of this thesis work are two-fold. First, we extract a rich set of 16 featuresfrom raw time series data and evaluate all them by ten-fold cross validation with different classifiers.We also did an extensive evaluation of the correlation of each feature with class labels based ont-test, Matthews Correlation Coefficient (MCC), Receiver Operating Characteristic (ROC), andArea Under the Curve (AUC).

The experiment results show that several features such as the volume, volume variance, t-testselection feature and stretched length are very predictive. The accuracy of OSA and Non-OSAclassification task can reach up to 77% with these features alone. The t-test conducted on thesefeatures show the significant correlation between the given features and the class labels. We alsofound an interesting result that on average OSA patients have smaller volume (stretch length)than non-OSA people. From medical point of view, this finding is consistent because OSA likelyoccurs when the muscles relax during sleep, causing soft tissue in the back of the throat to collapseand block the upper airway [11].

The second major contribution of this work is a method to search for local part of the timeseries from which the extracted features are most predictive. The main idea is based on anintuitive observation that large part of the time series which usually corresponds to normal partof the throat is not predictive. Therefore, using features extracted from the entire time seriesfor classification is not effective because normal part of the throat contributes a big factor to thevalue of the features. In order to avoid this type of noise information we propose method to searchfor small part of time series which likely corresponds to abnormal part of the throat from whichthe extracted features could be more predictive. We did an exhaustive search for such predictivelocal parts. We found local parts from which the volume and stretch length were extracted andimproved the classification accuracy from 65% to 75%. An interesting finding is that the mostpredictive local parts found by our algorithm are very close to the oropharyngeal junction point(OPJ), which has been already confirmed by some prior work in the literature [2] as an importantpoint for discriminating OSA and non-OSA. They almost belong to the oropharynx region, whichis consistent with the domain knowledge of OSA that the oropharynx region is the most possibleobstructed site in the upper airway.

Extracting Features to Discriminate OSA and non-OSA iii

Acknowledgements

First of all, my heartfelt gratitude goes out to my supervisor at Philips, Senior Scientist, StefanWinter, for his support and encouragement during this project and successful completion of mythesis. I am so honored to take part in the internship in Philips under the supervision of StefanWinter.

Secondly, I would like to thank my supervisor in TU/e, Professor Toon Calders, for his patientguidance and advices. Thirdly, I also want to thank my second supervisor in TU/e, Drs. ThanhLam Hoang, for providing me a lot of useful ideas and helps. Without his guidance, I could notcomplete this project.

Dr. Alexander Serebrenik is gratefully acknowledged for accepting to be part of my defensecommittee.

Moreover, I would like to thank many colleagues in Philips for their useful advices and concretecomments. They always shared their research experiences and gave me many useful suggestions.They made me enjoy doing this project.

Last but certainly not least, friends were also supportive during my stay in the Netherlands.This particularly concerns Jie Yang, Xuefei Chen and JiaChun Cui. I also wish to thank theimmeasurable help, though mostly operated at distance, of some close family members. Thisconcerns my father, my mother, my brother and most particularly my grandfather to whom Idedicate this thesis.

Extracting Features to Discriminate OSA and non-OSA v

Contents

Contents vii

List of Figures ix

List of Tables xi

List of Algorithms xiii

1 Introduction 11.1 Research Question . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Domain Knowledge 32.1 Sleep Apnea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.1.1 Obstructive Sleep Apnea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.2 Upper Airway Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.2.1 Anatomical Determinants of Upper Airway Caliber in OSA . . . . . . . . . 42.3 Acoustic Pharyngometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.3.1 Rationale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.3.2 Pros and Cons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3 Workflow 7

4 Datasets 94.1 Preprocessing of Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

4.1.1 Remove Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104.1.2 Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

5 Performance Criteria 125.1 Some Basic Performance Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

5.1.1 k-fold Cross-Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125.1.2 Classification Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125.1.3 ROC and AUC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135.1.4 MCC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135.1.5 T-test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

5.2 Classifier based on Information Gain and Optimal Split Point . . . . . . . . . . . . 145.2.1 Classifier based on OSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

6 Classification with Raw Data 166.1 1-Nearest Neighbor Algorithm with Euclidean Distance . . . . . . . . . . . . . . . 17

6.1.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176.1.2 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

Extracting Features to Discriminate OSA and non-OSA vii

CONTENTS

6.1.3 Analysis of Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186.1.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

6.2 1-Nearest Neighbor Algorithm with DTW(Dynamic Time Warping) Distance . . . 196.2.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196.2.2 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206.2.3 Analysis of Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206.2.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

6.3 Classifications with Popular Classification Algorithms . . . . . . . . . . . . . . . . 216.3.1 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216.3.2 Analysis of Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216.3.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

7 Features 237.1 Local Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

7.1.1 OPJ Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257.1.2 OPJ Position . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277.1.3 OPJ-aligned Feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287.1.4 Maximum Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297.1.5 OPJ Value/Maximum Value . . . . . . . . . . . . . . . . . . . . . . . . 31

7.2 Global Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337.2.1 Shapelets based Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 347.2.2 Using SAX to transform the time series into low-dimensional symbolic datasets 357.2.3 Volume . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367.2.4 Volume Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407.2.5 OPJ-aligned Volume . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427.2.6 Stretched Length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437.2.7 Reference-based feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467.2.8 Auto-Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497.2.9 DFT-based and DWT-based features . . . . . . . . . . . . . . . . . . . . . . 527.2.10 DTW Distance between Different Dimensions . . . . . . . . . . . . . . . . . 547.2.11 T-test Selection feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

7.3 Feature Combination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 577.3.1 Correlation-based Feature Selection (CFS) . . . . . . . . . . . . . . . . . . . 577.3.2 Information Gain Evaluation Ranking . . . . . . . . . . . . . . . . . . . . . 577.3.3 Feature Combination Performance . . . . . . . . . . . . . . . . . . . . . . . 58

7.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

8 Conclusions 60

Bibliography 62

viii Extracting Features to Discriminate OSA and non-OSA

List of Figures

2.1 A: Midsagittal MR image in a normal subject demonstrating the upper airwayregions [4]: (a) nasopharynx; (b)retropalatal; (c) retroglossal; (d) hypopharynx. B:Important sagittal upper airway structures demonstrated on MR imaging. . . . . . 4

2.2 midsagittal magnetic resonance image (MRI) in a normal subject (left) and in apatient with severe OSA (right) [1]. Note that the upper airway is smaller, in boththe retropalatal and retroglossal region. . . . . . . . . . . . . . . . . . . . . . . . . 5

3.1 The workflow of thole project. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

4.1 Cross-sectional area over distance from teeth. . . . . . . . . . . . . . . . . . . . . . 94.2 A normal time series. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104.3 This time series does not follow the normal pattern. . . . . . . . . . . . . . . . . . 10

5.1 Confusion matrix. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135.2 Roc space (quoted from Wikipedia) . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

6.1 Time alignment of two time series. . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

7.1 OPJ point in the time series. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257.2 ROC curve of results with four dimensions. . . . . . . . . . . . . . . . . . . . . . . 387.3 Best interval for Volume with four dimensions, which are indicated by blue lines. . 387.4 The selected interval of Volume Variance, which is indicated by blue lines. . . . . . 407.5 ROC curve of results with four dimensions. . . . . . . . . . . . . . . . . . . . . . . 467.6 Best interval for Stretched Length with four dimensions, which are indicated by blue

lines. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

Extracting Features to Discriminate OSA and non-OSA ix

List of Tables

4.1 Datasets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

6.1 Classification results of 1-NN algorithm with Euclidean Distance with 10-fold crossvalidation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

6.2 Accuracy with 1-NN algorithm with DTW distance with cross validation. . . . . . 20

6.3 Classification results based on several different classifiers with 10-fold cross valida-tion. The number with blue color indicates the biggest accuracy in the results ofall classifiers for one dimension, the tables in the later section are indicated in thesimilar way. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

6.4 h-value of T-test for all points in time series with α = 0.05, totally 45 points in eachtime series. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

7.1 Classification results of feature OPJ Value based on several different classifiers with10-folds cross validation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

7.2 Statistical results of feature OPJ Value. . . . . . . . . . . . . . . . . . . . . . . . . 26

7.3 Classification results of feature OPJ Position based on several different classifierswith 10-folds cross validation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

7.4 Statistical results of feature OPJ Position. . . . . . . . . . . . . . . . . . . . . . . . 28

7.5 Classification results of OPJ-aligned Feature based on several different classifierswith 10-fold cross validation, k is set 3. . . . . . . . . . . . . . . . . . . . . . . . . 29

7.6 h-value of T-test of OPJ-aligned feature. . . . . . . . . . . . . . . . . . . . . . . . . 29

7.7 Classification results of feature Maximum Value based on several different classifierswith 10-folds cross validation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

7.8 Statistical results of feature Maximum Value. . . . . . . . . . . . . . . . . . . . . . 30

7.9 Classification results of feature OPJ Value/Maximum Value based on several dif-ferent classifiers with 10-fold cross validation. . . . . . . . . . . . . . . . . . . . . 31

7.10 Statistical results of feature OPJ/Maximum. . . . . . . . . . . . . . . . . . . . . . . 32

7.11 Accuracy with feature Shapelets with 3-folds cross validation. . . . . . . . . . . . . 34

7.12 Accuracy with different popular classification algorithms on the symbolic sequencetransformed by SAX. In this example, the 45 points are transformed into 9 dimen-sions and the alphabet size is 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

7.13 Accuracy with different popular classification algorithms, taking the histogram vec-tor of the symbolic sequence as the feature. . . . . . . . . . . . . . . . . . . . . . . 35

7.14 Accuracy for feature Volume. The total length of interval is 45. . . . . . . . . . . . 37

7.15 Accuracy for feature Volume without interval selection, i.e., volume of whole timeseries data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

7.16 Classification results of feature Volume based on Decision Tree model. . . . . . . . 37

7.17 Statistical results of feature Volume. . . . . . . . . . . . . . . . . . . . . . . . . . . 37

7.18 Evaluation result of feature Volume Variance. . . . . . . . . . . . . . . . . . . . . 40

7.19 Accuracy with feature OPJ-aligned Volume. . . . . . . . . . . . . . . . . . . . . . 42

7.20 Accuracy with feature Stretched Length. The total length of interval is 45. . . . . . 44

Extracting Features to Discriminate OSA and non-OSA xi

LIST OF TABLES

7.21 Accuracy for feature Stretched Length without interval selection, i.e., stretchedlength of whole time series data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

7.22 classification results of feature Stretched Length based on Decision Tree model. . . 457.23 Statistical results of feature Stretched Length. . . . . . . . . . . . . . . . . . . . . . 457.24 Classification results with RefModel1 based on several different classifiers with

10-folds cross validation. Note that in the test set, there are 20 non-OSA subjectsand 63 OSA subjects since 20 non-OSA subjects are used to calculate the referencemodel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

7.25 Statistical results of reference-based feature based on RefModel1. . . . . . . . . . 497.26 Classification results with RefModel2 based on several different classifiers with

10-folds cross validation. Note that in the test set, there are 30 non-OSA subjectsand 63 OSA subjects since 10 non-OSA subjects are used to calculate the referencemodel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

7.27 Statistical results of reference-based feature based on RefModel2. . . . . . . . . . 497.28 classification results of feature autocorrelation with time step = 1 based on several

different classifiers with 10-fold cross validation. . . . . . . . . . . . . . . . . . . . . 507.29 classification results of feature autocorrelation with time step ranging from 1 to 30

based on several different classifiers with 10-folds cross validation. . . . . . . . . . . 517.30 h-value of T-test of feature autocorrelation with time step ranging from 1 to 30. . . 517.31 Accuracy of top-8 dimensions DFT-based features with several different classifiers. 537.32 Accuracy of top-8 dimensions DWT-based features with several different classifiers. 537.33 p-value of T-test of top-8 dimensions DFT-based features. . . . . . . . . . . . . . . 537.34 p-value of T-test of top-8 dimensions DWT-based features. . . . . . . . . . . . . . 537.35 Classification results of DTW distance between different dimensions based on sev-

eral different classifiers with 10-folds cross validation. Note that Su B ↔ Si B isthe abbreviation of Supine Breath vs Sitting Breath while Su H ↔ Si H meansSupine Hold vs Sitting Hold. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

7.36 T-test results of DTW distance between different dimensions. . . . . . . . . . . . . 557.37 h-value of T-test of feature T-test Selection with α = 0.01, totally 45 points in each

time series. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 567.38 Classification results of feature T-test Selection based on several different classifiers

with 10-folds cross validation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 577.39 Selected features by CFS among all features with 10-fold cross validation. The

probability shown after the feature name indicates the probability of being selectedin 10-fold cross validation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

7.40 Top 4 features ranked by Information Gain Evaluation . . . . . . . . . . . . . . . . 587.41 Accuracy with Decision Tree and Random Forest classifiers for combined features. 59

xii Extracting Features to Discriminate OSA and non-OSA

List of Algorithms

1 Feature-Evaluation-Algorithm(F,Dtrain, Dtest) . . . . . . . . . . . . . . . . . . . . 152 1-NN Euclidean Distance (Train set, Train label, unknown object) . . . . . . . . 173 DTW Distance calculation algorithm (s : array[1..n], t : array[1..m], w : int) . . . 194 Volume-Extraction-Algorithm(D, K) . . . . . . . . . . . . . . . . . . . . . . . . . . 365 Stretched-Length-Extraction-Algorithm(D, K) . . . . . . . . . . . . . . . . . . . . 436 CalculateStretchedLength(D, start, end) . . . . . . . . . . . . . . . . . . . . . . . . 447 DFT DWT based Feature Extraction(D, l, m) . . . . . . . . . . . . . . . . . . . . 52

Extracting Features to Discriminate OSA and non-OSA xiii

Chapter 1

Introduction

This document is the master thesis concerning the internship project of extracting features todiscriminate OSA and non-OSA subjects, which is finished at Philips Research in Eindhoven,supervised by Scientist Stefan Winter in Philips and Prof. Toon Calders in TU/e.

The rest of the document is organized as follows. The research question, motivation and therelated work are presented in this section. In section 2, the domain knowledge of obstructive sleepapnea (OSA) are given. Section 3 presents the workflow of the whole project. Then the datasetsdescription including the preprocessing is presented in section 4. The performance criteria usedin this thesis is given in section 5. We conduct the classifications directly on the raw datasetswithout any feature extraction in section 6, then a rich set of 16 features, including local featuresand global features, are proposed in section 7. Finally, section 8 indicates the conclusion of thisthesis.

1.1 Research Question

The topic is about analyzing anatomical data obtained by pharyngometry with respect to obstruc-tive sleep apnea (OSA). We have recorded data from over 100 subjects (OSA and non-OSA) thatcan be analyzed. The central question is whether we can define any features that discriminateOSA and non-OSA subjects. We would look into the underlying anatomy and work on extractingdiscriminative features. The deliverable would be algorithms that extract these features from thegiven data and a description of their discriminative power along with an idea of their physiologicalmeaning.

The focus is on proposing (and investigating) discriminative (OSA/non-OSA) features. Whilethis of course requires classification based on the proposed features to assess their discriminativepower, the challenge is more in defining good features rather than implementing an optimizedclassifier.

1.2 Motivation

Obstructive sleep apnea (OSA) is the most common type of sleep apnea and is caused by obstruc-tion of the upper airway. It is characterized by repetitive pauses in breathing during sleep, despitethe effort to breathe, and is usually associated with a reduction in blood oxygen saturation. Thesepauses in breathing, called ”apneas” (literally, ”without breath”), typically last 20 to 40 seconds.

The individual with OSA is rarely aware of having difficulty breathing, even upon awakening.It is recognized as a problem by others witnessing the individual during episodes or is suspectedbecause of its effects on the body (sequelae).

Diagnosis of OSA is often based on a combination of patient history and tests (lab- or home-based). Given the recorded test data of patients, how to define good features to discriminate theOSA and non-OSA subjects is a challenging and meaningful task, which could greatly help the

Extracting Features to Discriminate OSA and non-OSA 1

CHAPTER 1. INTRODUCTION

process of the diagnosis of OSA. This project is expected to propose good algorithms that extractdiscriminative (OSA/non-OSA) features from the given data.

1.3 Related Work

Several research work has been done with respect to validity of Acoustic Pharyngometry, the struc-ture characteristics of upper airway of OSA patients and features extraction about discriminatingOSA and non-OSA subjects.

[5] develops a standard operating protocol for Acoustic Pharyngometry that ensures a compre-hensive repeatability of upper airway measurements. This paper gives the detailed specificationsto the following aspects: equipment, subject selection criteria and operation procedure of AcousticPharyngometry. In addition, as another contribution of this paper, a detailed survey about thedevelopment of acoustic pharyngometry is made in this paper and some important issues aboutAcoustic Pharyngometry are discussed.

[7]assesses the repeatability of pharyngeal cross-sectional area measurements obtained fromnormal and snoring individuals. The evaluation results show that repeatability of Acoustic Pharyn-gometry results can be achieved following the standard operating protocol. This conclusion addsto the reliability of Acoustic Pharyngometry in assessing the pharyngeal airway in patients withsnoring and OSA.

The genetic basis of upper airway size as determined is assessed in [12] using Acoustic Pharyn-gometry. It concludes that the minimum cross-sectional area exhibits a heritability of 0.34 inwhite subjects and 0.39 in African-Americans, which suggests that 30− 40% of the total variancein this measure is explained by shared familial factors. This conclusion indicates that the mini-mum cross-sectional area of the oropharynx is a highly heritable trait, which suggests the presenceof an underlying genetic basis and demonstrates the potential utility of Acoustic Pharyngometryin dissecting the genetic basis of OSA.

[2] investigates the predictability and usefulness of Acoustic Pharyngometry in diagnosis ofOSA. It concludes that the oropharyngeal junction point (OPJ) of the supine position is the mostpredictive parameter to discriminate OSA and non-OSA. However, as shown in Section 7.1.1, theOPJ point is not a valued feature for our datasets, the accuracy is only about 60%.

[6] uses the snorers with and without OSA to conduct the experiments to assess the acousticpharyngometry patterns. It found 2 patterns for non-OSA snorers and 3 patterns for OSA snorers.

[8] concludes that MCA(Minimal Cross-sectional Airway Area) would differentiate betweenmild and moderate/severe OSA (apnoea-hypopnoea index [AHI], < 15 and ≥ 15 events/hour).MCA was shown to predict the presence of moderate to severe OSA, independent of age, sex, andneck circumference.

2 Extracting Features to Discriminate OSA and non-OSA

Chapter 2

Domain Knowledge

2.1 Sleep Apnea

Sleep apnea [14] is a sleep disorder characterized by abnormal pauses in breathing or instances ofabnormally low breathing during sleep. Each pause in breathing, called an apnea, can last fromat least ten seconds to minutes, and may occur 5 to 30 times or more an hour. Similarly, eachabnormally low breathing event is called a hypopnea.

There are three forms of sleep apnea: central (CSA), obstructive (OSA), and complex ormixed sleep apnea (i.e. a combination of central and obstructive) constituting 0.4%, 84%and15%of cases respectively [13]. In CSA, breathing is interrupted by a lack of respiratory effort; in OSA,breathing is interrupted by a physical block to airflow despite respiratory effort, and snoring iscommon.

2.1.1 Obstructive Sleep Apnea

Obstructive sleep apnea (OSA) [11] is a sleep-related breathing disorder that involves a decreaseor complete halt in airflow despite an ongoing effort to breathe. It occurs when the muscles relaxduring sleep, causing soft tissue in the back of the throat to collapse and block the upper airway.This leads to partial reductions (hypopneas) and complete pauses (apneas) in breathing that lastat least 10 seconds during sleep. Most pauses last between 10 and 30 seconds, but some maypersist for one minute or longer. This can lead to abrupt reductions in blood oxygen saturation,with oxygen levels falling as much as 40 percent or more in severe cases.

The brain responds to the lack of oxygen by alerting the body, causing a brief arousal fromsleep that restores normal breathing. This pattern can occur hundreds of times in one night. Theresult is a fragmented quality of sleep that often produces an excessive level of daytime sleepiness.Most people with OSA snore loudly and frequently, with periods of silence when airflow is reducedor blocked. They then make choking, snorting or gasping sounds when their airway reopens.

A common measurement of sleep apnea is the apnea-hypopnea index (AHI). This is an averagethat represents the combined number of apneas and hypopneas that occur per hour of sleep.

Types

• Mild OSA: AHI of 5-15Involuntary sleepiness during activities that require little attention, such as watching TV orreading.

• Moderate OSA: AHI of 15-30Involuntary sleepiness during activities that require some attention, such as meetings orpresentations.


CHAPTER 2. DOMAIN KNOWLEDGE

• Severe OSA: AHI of more than 30Involuntary sleepiness during activities that require more active attention, such as talkingor driving. However, in our case, we consider the subjects with AHI < 15 to be normal andthe subjects with AHI ≥ 15 to be OSA patients.

2.2 Upper Airway Structure

Using sagittal imaging [4],the upper airway has been subdivided into three regions, as shown inFigure 2.1:

1. naso-pharynx (region between the nasal turbinates and hard palate);

2. oropharynx, which can be subdivided into the retropalatal (defined from the level of the hardpalate to the caudal margin of the soft palate; also called the velopharynx) and retroglossal(defined from the caudal margin of the soft palate to the base of the epiglottis) regions. In themajority of patients with sleep apnea, airway closure during sleep occurs in the retropalataland retroglossal regions.

3. hypopharynx (region from the base of the tongue to the larynx).

Figure 2.1: A: Midsagittal MR image in a normal subject demonstrating the upper airway regions[4]: (a) nasopharynx; (b)retropalatal; (c) retroglossal; (d) hypopharynx. B: Important sagittalupper airway structures demonstrated on MR imaging.

2.2.1 Anatomical Determinants of Upper Airway Caliber in OSA

[1] Studies using nasal pharyngoscopy, computer tomography and magnetic resonance imaging,or pharyngeal pressure monitoring have shown that one or more sites within the oral pharayngealregion are usually where closure occurs in most subjects with OSA, and this region is also smaller in



OSA patients versus controls even during wakefulness (see Figure2.2) . Although the retropalatalregion of the oropharynx is the most common site of collapse, airway narrowing is a dynamicprocess, varying markedly among and within subjects and often includes the retroglossal andhypopharyngeal areas.

Figure 2.2: midsagittal magnetic resonance image (MRI) in a normal subject (left) and in a patientwith severe OSA (right) [1]. Note that the upper airway is smaller, in both the retropalatal andretroglossal region.

The recent use of quantitative imaging techniques has allowed advances that reveal importantdifferences in both craniofacial and upper airway soft tissue structures in the OSA patient. Thereduced size of cranial bony structures in the OSA patient include a reduced mandibular bodylength, inferior positioned hyoid bone, and retro position of the maxilla, all of which compromisethe pharyngeal airspace. Airway length, from the top of the hard palate to the base of the epiglot-tis, is also increased in OSA patients, perhaps reflecting the increased proportion of collapsibleairway exposed to collapsing pressures. As expected, these craniofacial dimensions are primarilyinherited, as the relatives of OSA patients demonstrated retroposed and short mandibles and in-feriorly placed hyoid bones, longer soft palates, wider uvulas, and higher narrower hard palatesthan matched controls.

Enlargement of soft tissue structures both within and surrounding the airway contributes sig-nificantly to pharyngeal airway narrowing in most cases of OSA. An enlarged soft palate andtongue would encroach on airway diameter in the anterior-posterior plane, while the thickenedpharyngeal walls would encroach in the lateral plane. Volumetric time overlapped magnetic reso-nance imaging (MRI) or computer tomography (CT) images strongly implicate the thickness of thelateral pharyngeal walls as a major site of airway compromise, as the airway is narrowed primarilyin the lateral dimension in the majority of OSA patients. Furthermore, treatment with CPAP,weight loss, or mandibular advancement all show increases in the lateral pharyngeal dimensions.

2.3 Acoustic Pharyngometry

Several researches have shown that the pharyngeal size and the dynamic behavior of the upperairway are important factors in the production of OSA. Hence assessment of the upper airway forpossible sites of obstruction is crucial. The measurements of the upper airway are difficult becauseit is a geometrically complex structure subject to considerable variability. Acoustic pharyngometryis a useful tool for localizing the possible sites of upper airway obstruction in cases of OSA.

2.3.1 Rationale

Reflections of acoustic pulse disturbances introduced at the mouth can be used to infer the cross-sectional airway area of the oral cavity and pharyngeal spaces down to the level of the larynx. In



this technique, phase and amplitude information of the reflected sound wave can be transformedinto an area-distance relationship.

2.3.2 Pros and Cons

Compared to other methods for objectively evaluating the upper airway, Acoustic pharyngometryhas many advantages.

• Advantage:

– Portability: easy to operate, free tidal breathing during measurement.

– Real time display of airway area.

– Lack of radiation involvement.

– Noninvasive.

– Ability to assess the entire airway simultaneously.

• Disadvantage:

– Cannot provide information about specific tissue structures that impinge upon theairway.

– Reduce accuracy compared to CT.


Chapter 3

Workflow

Figure 3.1 shows the workflow of the whole project. Given the datasets, we first conduct thepreprocessing operations on it, including the removing noise and normalization in Chapter 4.Then we try several classical time series classification algorithms directly on the raw datasetswithout any feature extraction to check the performance, seen in Chapter 6 , such as 1 NearestNeighbor Algorithm with Euclidean distance and Dynamic Time Warping distance, Support VectorMachine algorithm, Logistic Regression algorithm and so on. As we will see in the later section,direct classifications on the raw datasets fail to discriminate OSA and non-OSA effectively, thefailed reason is that large part of the time series which usually corresponds to normal part of thethroat is not predictive. Therefore, using the entire time series as the features for classification isnot effective because normal part of the throat contributes a big factor to the value of the features.Hence, we have to extract valued features from the raw datasets, which is also the primary goalof this project.

In this project, we propose about 16 features, which can be classified into two parts accordingto the feature characteristics: (1) local features in Section 7.1 , which focus on some local pointsand local properties of the time series, such as OPJ point,maximum point, OPJ Value / MaximumValue and so on, and (2) global features in Section 7.2, which investigate the whole time seriesand try to extract some valued features from the global view, such as volume, stretched length,DFT -based features and so on. For each feature, we first present the algorithm specification, thenconduct the all-sided experimental evaluation on the performance of the feature, finally we givethe detailed explanation from the view of domain knowledge about OSA. In addition, we combineall the features (only single-value feature) we proposed together and conduct a comprehensiveanalysis on them in Section 7.3, including the correlation-based feature selection, information gainranking, performance evaluation on combined features.


CHAPTER 3. WORKFLOW

Figure 3.1: The workflow of thole project.


Chapter 4

Datasets

The data to be analyzed in this project is the recorded data from over 100 subjects(OSA andnon-OSA). The data consists of graphs (cross-sectional area over distance from teeth) as shown inFigure 4.1. It can be considered as 1D imaging data of the upper airway. The values of the graphsare available in simple ASCII format. The features need to be extracted from these graphs.

Figure 4.1: Cross-sectional area over distance from teeth.

In the datasets, since the data for each subject is measured at uniform distance interval to theteeth, hence it can be considered as the time series, thus, the problem is actually a time seriesfeature extraction problem. In this way, we can apply many popular time series feature extractiontechniques to our problem.

As shown in Table 4.1, there are totally 103 subjects in the datasets which consists of 63OSA subjects and 40 non-OSA subjects. For each subject, there are four time series measured infour different postures (called four dimensions in this report): supine breath, supine hold, sittingbreath, sitting hold. Each time series contains 45 steps of points.

total number OSA non-OSA measurements time series steps

103 63 404 dimensions(supine breath, supine hold,

45sitting breath, sitting hold)

Table 4.1: Datasets.


CHAPTER 4. DATASETS

4.1 Preprocessing of Datasets

Two preprocess operations are conducted over the datasets before further analyzing them. Wefirst remove the noise contained in the datasets, then apply the normalization to make all the timeseries in a uniform scale.

4.1.1 Remove Noise

We develop a visualization tool for the datasets and check whether each of the time series followsthe normal pattern as shown in Figure 4.2. A normal time series always goes through a peakwhich is corresponding to the oral cavity, then a local minimal point named OPJ (oropharyngealjunction) point which is corresponding to the first minimal point in the oropharynx reagion. If atime series is totally different from this normal pattern, we will remove it manually. E.g., the timeseries shown in Figure 4.3 is removed since it does not follow the normal pattern.

Figure 4.2: A normal time series.

Figure 4.3: This time series does not follow the normal pattern.

4.1.2 Normalization

As we known, though the basic structure of the upper airway is almost same between differentpersons, there are still some subtle differences, for example, some more obese and taller personsare prone to possess bigger size of upper airway than the thinner and lower persons. In order toavoid such impact, we apply the normalization technique to all the time series in the datasets.

In our project, y-scale normalization is applied to the datasets, which is indicated in Equa-tion 4.1. For each point in the time series, y value is divided by the maximum value max valuein the whole time series .


CHAPTER 4. DATASETS

In this way, the y-axis value is scaled into the range [0, 1] while keeping the shape of thetime series. The normalization operation makes two series indistinguishable, provided they areproportional to one another, i.e., ai = λbi for all i.

y′ =y

max value(4.1)

Application of Normalization

We apply the normalization to some of our features, but not all. Since not all features need thenormalization. For instance, the feature OPJ Value adopts the original value but not the normal-ized value. The normalized OPJ Value is equivalent to the feature OPJ value / maximum value.With respect to feature Volume, we tried both the normalized volume and non-normalized vol-ume and find that non-normalized volume performs much better than normalized volume, whichindicates that the actual value could better represent the difference between OSA and non-OSAthan the normalized value w.r.t. volume.

One possible explanation is that if the OSA affect the whole upper airway, then the cross-sectional area will be decreased in whole part of the upper airway, then the y-scale normalizationwould make this decrease invisible.

In some cases, the normalization is necessary, for instance, the features concerned with DTWdistance calculation need the normalization to make the two time series align more accurately.


Chapter 5

Performance Criteria

In this section, we introduce briefly several performance criteria that we use in this project.

5.1 Some Basic Performance Criteria

5.1.1 k-fold Cross-Validation

Cross-Validation is a statistical method of evaluating and comparing learning algorithms by divid-ing data into two segments: one used to learn or train a model and the other used to validate themodel. In typical cross-validation, the training and validation sets must cross-over in successiverounds such that each data point has a chance of being validated against. The basic form ofcross-validation is k-fold cross-validation.

In k-fold cross-validation the data is first partitioned into k equally (or nearly equally) sizedsegments or folds. Subsequently k iterations of training and validation are performed such thatwithin each iteration a different fold of the data is held-out for validation while the remainingk − 1 folds are used for learning. The validation results are averaged over the k rounds.

The biggest advantage of cross validation is to avoid the overfitting of predictive model tothe datasets. In data mining and machine learning 10-fold cross-validation (k= 10) is the mostcommon. In our experiments, we also apply 10-fold cross validation to evaluate the discriminativeperformance of feature.

5.1.2 Classification Terminology

Suppose we have an experiment with the datasets which contains P positive instances and Nnegative instances. The four outcomes can be formulated in a 2 confusion matrix, as shown inFigure 5.1.

• sensitivity = true positive rate(TPR) = recall = hit rate

TPR =TP

TP + FN(5.1)

• specificity = true negative rate

specificity =TN

TN + FP(5.2)

• precision = positive predictive value

precision =TP

TP + FP(5.3)


CHAPTER 5. PERFORMANCE CRITERIA

• accuracy

accuracy =TP + TN

TP + FN + FP + TN(5.4)

Figure 5.1: Confusion matrix.

5.1.3 ROC and AUC

A receiver operating characteristic (ROC), or simply ROC curve, is a graphical plot which illus-trates the performance of a binary classifier system as its discrimination threshold is varied. It iscreated by plotting the fraction of true positives out of the positives (TPR = true positive rate) vs.the fraction of false positives out of the negatives (FPR = false positive rate), at various thresholdsettings. TPR is also known as sensitivity (also called recall in some fields), and FPR is one minusthe specificity or true negative rate.

The best possible prediction method would yield a point in the upper left corner or coordinate(0,1) of the ROC space, representing 100% sensitivity (no false negatives) and 100% specificity(no false positives). The (0,1) point is also called a perfect classification. A completely randomguess would give a point along a diagonal line (the so-called line of no-discrimination) from theleft bottom to the top right corners.

As shown in Figure 5.2, the algorithm represented by point A is better than B and C whilethe algorithm represented by B is equal to random guessing.

AUC means the area under the curve, it ranges in [0, 1]. Higher AUC indicates a betterperformance of the classifier.

5.1.4 MCC

The Matthews Correlation Coefficient (MCC) is used to evaluate the performance of binary clas-sifications. It can be calculated from confusion matrix in Figure 5.1, as shown in Equation 5.5.

MCC ranges from -1 to +1, +1 represents a perfect prediction, 0 is equal to random predictionand -1 means total disagreement between prediction and observation.

MCC =TP × TN − FP × FN√

(TP + FP )(TP + FN)(TN + FP )(TN + FN)(5.5)

5.1.5 T-test

Student’s t-test is used to determine whether two sets of data are significantly different from eachother. the premise of the t-test is that these two sets of data must follow normal distribution.



Figure 5.2: Roc space (quoted from Wikipedia).

In our experiments, since the size of OSA (63) is not equal to non-OSA (40), and the variancesare always not equal, hence we should apply the t-test as shown in Equation 5.6, where x and yis the mean value of the sets, sx and sy is the standard deviations, n and m are the size of sets.

h-value returned by T-test will be 1 if there is significant difference between two sets, whichmeans a rejection of the null hypothesis that there are no significant difference between two sets.

t =x− y√s2xn +

s2ym

(5.6)

5.2 Classifier based on Information Gain and Optimal SplitPoint

Our algorithms need some metric to evaluate how well a feature can divide the entire combineddatasets into two original classes, i.e., the discriminative power. In this project, for some features,we use Information Gain to find the Optimal Split Point(OSP). Then we can apply Decision Treemodel to evaluate the performance for this feature based on the Optimal Split Point. The relateddefinitions are defined as follows.

Definition 1 (Entropy). A time series dataset D consists of two classes, A and B. Given thatthe proportion of objects in class A is p(A) and the proportion of objects in class B is p(B), theentropy of D is: I(D) = −p(A) log(p(A))− p(B) log(p(B)).

Given the value of a feature for all instances, each splitting point divides the whole datasetD into two subsets, D1 and D2. Therefore, the information remaining in the entire dataset aftersplitting is defined by the weighted average entropy of each subset. Suppose the fraction of objectsin D1 and D2 are f(D1) and f(D2) respectively. The total entropy of D after splitting is

I(D) = f(D1)I(D1) + f(D2)I(D2) (5.7)

Then the information gain for any splitting strategy is defined as:

Definition 2 (Information Gain). Suppose a certain split point sp divides D into two subsets D1

and D2, the entropy before and after splitting is I(D) and I(D), then the information gain for



Algorithm 1 Feature-Evaluation-Algorithm(F,Dtrain, Dtest)

1: Input: Feature value F for each subject in dataset D, the training set Dtrain and the test setDtest

2: Output: Classification accuracy A with this feature3: Set1 ← ∅4: Set2 ← ∅5: OSP ←− CalculateOptimalSplitPoint(Dtrain) // calculate the OSP in the training dataset6: for each subject s in Dtest do7: if F (s) < OSP then8: Set1 ← Set1 ∪ s9: else

10: Set2 ← Set2 ∪ s11: end if12: end for13: A← CalculateAccuracy(Set1, Set2, Dtest) // check the similarity with the real value of label

in the test set Dtest

14: Return A

this splitting point isGain(sp) = I(D)− I(D) (5.8)

Given the value of each subject in the dataset for a given feature, to evaluate the discriminativepower of the feature, we sort the subjects according to the its feature value and find an optimalsplit point between two neighboring feature value based on Information Gain.

Definition 3 (Optimal Split Point). Our time series dataset D consists of two classes: OSA andnon-OSA. For a given feature F , we choose some threshold Vth and split D into D1 and D2, suchthat for every time series subject T1,i in D1, FeatureV alue(T1,i) < Vth and for every time seriessubject T2,i in D2, FeatureV alue(T2,i) > Vth. An Optimal Split Point is a threshold that

Gain(F, VOSP (D,F )) ≥ Gain(F, V ′th) (5.9)

for any other threshold V ′th.

5.2.1 Classifier based on OSP

For some single-value feature we propose, we use OSP and information gain to evaluate its discrim-inative power. Specifically, given a feature F , we first calculate the corresponding OSP OptimalSplit Point, then calculate the accuracy based on the Decision Tree model with OSP, as shownin Algorithm 1.In practice, we use k-folds cross-validation technique to calculate the accuracy toavoid the overfitting for the datasets.


Chapter 6

Classification with Raw Data

After preprocessing operations on the datasets, we first try to apply some classical time seriesclassification algorithms directly on the raw data, i.e., we use all the 45 points in the time seriesas the input features of the classifier, to check the prediction performance.

As one of the most popular time series classification algorithm, 1-Nearest Neighbor (1-NN)Algorithm with Euclidean distance are effective in most time series problem, hence we first applythis algorithm to our datasets. However, it performs poor on our data for the reasons explainedlater. Then in order to remove the shifting between different time series and align them together,we replace the Euclidean distance with Dynamic Time Warping distance in 1-NN algorithm.Finally, several popular classification models are applied to our datasets, taking the entire timeseries as input features. However, they all performs poor for the reason we explained in detail inthe following subsection.


CHAPTER 6. CLASSIFICATION WITH RAW DATA

6.1 1-Nearest Neighbor Algorithm with Euclidean Distance

1-Nearest Neighbor (1-NN) algorithm is a one of most classical time series classification algorithm,recent empirical evidence has strongly suggested that the simple nearest neighbor algorithm isvery difficult to beat for most time series problems. The biggest advantage of this algorithm is itssimplicity of implementing. Hence, I first apply the 1-NN algorithm with Euclidean distance toour data to check the performance.

6.1.1 Algorithm

Algorithm 2 shows the 1-NN algorithm process with Euclidean Distance. It works very simply:predict the label of the unknown object by the label of the object with the nearest Euclideandistance in the training set.

Algorithm 2 1-NN Euclidean Distance (Train set, Train label, unknown object)

1: Input: Train set, Train label, unknown object //Train set is the training set andTrain label indicates the label for each object in Train set, the goal is to predict the la-bel of the unknown object.

2: Output: the label predicted class of the unknown object3: //initialization4: best so far ← inf ;5: for i = 1→ size(Train set) do6: distance← sqrt(sum(Train set(i)−unknown object)2) // calculate the Euclidean distance

7: if distance < best so far then8: predicted class← Train label(i)9: best so far ← distance

10: end if11: end for12: Return predicted class

6.1.2 Experimental Evaluation

In order to avoid the overfitting, we apply the 1-NN algorithm with Euclidean Distance to ourdatasets with 10-fold cross validation. Table 6.1 shows the experimental result. We can find thatthe accuracy is between 55% to 63%, which is not good enough. The AUC is around 0.5 andMCC is around 0.000, which indicates it is almost equivalent to random guess.

Supine Breath Supine Hold Sitting Breath Sitting HoldAccuracy 56.3% 55.3% 57.3% 62.1%

AUC 0.513 0.524 0.535 0.611MCC 0.110 0.034 0.076 0.214

Precision 0.578 0.541 0.561 0.627Recall 0.563 0.553 0.573 0.621

TP FP 37 19 43 26 44 25 42 18FN TN 26 21 20 14 19 15 21 22

Table 6.1: Classification results of 1-NN algorithm with Euclidean Distance with 10-fold crossvalidation.



6.1.3 Analysis of Failure

There are two possible factors that lead to failure with 1-NN algorithm with Euclidean distance:

• It is observed that there is some shifting between one time series and another, i.e., the timeseries are not aligned well between each other.

• Calculating distance with entire time series is not effective since large part of the time serieswhich is corresponding to normal part of the throat are not predictive but this part contributea big factor to the calculation.

6.1.4 Conclusion

From the evaluation results based on our datasets, we can conclude that 1-NN algorithm withEuclidean Distance is not feasible to our datasets. All kinds of quantitive indicators show that itis no better than random guess.



6.2 1-Nearest Neighbor Algorithm with DTW(Dynamic TimeWarping) Distance

Dynamic time warping (DTW) is a well-known technique to find an optimal alignment betweentwo given (time-dependent) sequences under certain restrictions (Figure 6.1). Intuitively, thesequences are warped in a nonlinear fashion to match each other. In fields such as data miningand information retrieval, DTW has been successfully applied to automatically cope with timedeformations and different speeds associated with time-dependent data and measure similaritybetween two sequences which may vary in time or speed.

Figure 6.1: Time alignment of two time series.

Since in our datasets, there may be some shifting along x-axis between two time series, henceDTW algorithm can be used to remove the offset and align them along x-axis.

6.2.1 Algorithm

We use DTW to calculate the distance between two time series instead of Euclidean distance inline 6 in Algorithm 2.

Algorithm 3 shows the DTW distance calculation algorithm using dynamic programmingmethod. In order to make the alignment more precisely, y-scale normalization is applied to thedata first.

Algorithm 3 DTW Distance calculation algorithm (s : array[1..n], t : array[1..m], w : int)

1: Input: s : array[1..n], t : array[1..m], w : int //s and t are two time series with length of mand n, w is the window parameter indicating the constraint that two aligned points can onlybe in a same window.

2: Output: the DTW distance between s and t3: //initialization4: distance← array[0..n, 0..m]5: w ← max(w, abs(n−m))6: for i = 0→ n do7: for j = 0→ m do8: distance[i, j]← infinity9: end for

10: end for11: distance[0, 0]← 012: for i = 1→ n do13: for j = max(1, i− w)→ min(m, i+ w) do14: cost← d(s[i], t[j])15: distance← cost+minimum(distance[i− 1, j], distance[i, j − 1], distance[i− 1, j − 1])16: end for17: end for18: Return distance[n,m]




In order to avoid the overfitting, we apply the 1-NN algorithm with DTW distance to our datasetswith cross validation. Table 6.2 shows the experimental accuracy. We can find that the accuracyis around 60% , which is almost equivalent to random guess.

Decision Tree accuracySupine breath 58.0%Supine hold 59.0%

Sitting breath 53.0%Sitting hold 58.0%

Table 6.2: Accuracy with 1-NN algorithm with DTW distance with cross validation.


One possible reason for the failure with 1-NN algorithm based on DTW distance is same as thereason for the failure with 1-NN with Euclidean distance: calculating distance with entire timeseries is not effective since large part of the time series which is corresponding to normal part ofthe throat are not predictive but this part contribute a big factor to the calculation.

In addition, the failure of 1-NN with DTW distance illustrates that the DTW distance is nota discriminative feature for OSA.

6.2.4 Conclusion

From the evaluation results based on our datasets, we can conclude that 1-NN algorithm withDTW distance is not feasible to our datasets. .



6.3 Classifications with Popular Classification Algorithms

In this section, we apply several popular classification algorithms to our datasets to check whetherthe datasets can be linearly separated well. We respectively tried the following algorithms.

• LibLinear: a library for large linear classification.

• SVM (Support Vector Machine)

• Logistic Regression algorithm.

• Decision-Tree algorithm.

The results in this part could be considered as the baseline to compare with the results of proposedfeatures later.


Table 6.3 shows the classification results based on several different classifiers with 10-fold crossvalidation, which takes all the points of time series as features. we can find that the result is notgood, it is no better than random guess.

Supine Breath Supine Hold Sitting Breath Sitting Hold

Linear

Accuracy 61.2% 53.4% 57.3% 47.6%AUC 0.573 0.478 0.537 0.462MCC 0.154 -0.050 0.076 -0.075

TP FP 47 24 46 31 44 25 33 24FN TN 16 16 17 9 19 15 30 16

SVM

Accuracy 63.1% 65.0% 52.4% 61.2%AUC 0.557 0.577 0.442 0.537MCC 0.153 0.208 -0.160 0.098

TP FP 56 31 57 30 51 37 55 32FN TN 7 9 6 10 12 3 8 8

Logistic

Accuracy 63.1% 50.5% 53.4% 47.6%AUC 0.631 0.460 0.463 0.443MCC 0.254 -0.019 0.010 -0.075

TP FP 40 15 35 23 40 25 33 24FN TN 23 25 28 17 23 15 30 16

Decision Tree

Accuracy 63.1% 62.1% 56.3% 64.1%AUC 0.566 0.493 0.467 0.536MCC 0.197 0.129 0.014 0.177

TP FP 48 23 55 31 48 30 60 34FN TN 15 17 8 9 15 10 3 6

Table 6.3: Classification results based on several different classifiers with 10-fold cross validation.The number with blue color indicates the biggest accuracy in the results of all classifiers for onedimension, the tables in the later section are indicated in the similar way.


The main reason for the failure in this section is that using entire time series as input features forthe classifier is not effective since large part of the time series which is corresponding to normalpart of the throat are not predictive but this part contribute a big factor to the calculation.



Table 6.4 shows the t-test result for all points in the time series. We can see that only smallpart of the time series, whose returned h-value is 1, has significant difference between OSA andnon-OSA, which illustrates that most part of the time series are not discriminative from the viewof t-test.

h = 1 h=0number of points number of points

with h=1 with h=0Supine breath {21− 24, 29− 37} {others points} 16 29Supine hold {23− 35} {other points} 13 32

Sitting breath {11− 14, 22− 35} {other points} 18 27Sitting hold {23− 29} {other points} 7 38

Table 6.4: h-value of T-test for all points in time series with α = 0.05, totally 45 points in eachtime series.

6.3.3 Conclusion

From the evaluation result, we can find that taking all points of time series as input features arenot a good idea. One possible reason for the failure is that based on an intuitive observation largepart of the time series which usually corresponds to normal part of the throat is not predictive.Therefore, using the entire time series as features for classification is not effective because normalpart of the throat contributes a big factor to the value of the features, we should select morediscriminative points as features, which is described later.


Chapter 7

Features

From the experiments result in the last chapter, we can find that direct classification with severalpopular classifiers on the raw data performs bad. Hence we have to extract some valued featuresto discriminate OSA and non-OSA, which is also the objective of this project.

In this chapter, we propose several features, attempting to find good features to discriminateOSA subjects and non-OSA subjects.

We try to extract features from two aspects. On one hand, we focus on some local pointsand local properties of the time series, such as OPJ point,maximum point, OPJ Value / Maxi-mum Value and so on, we call these features as local features. On the other hand, we investigatethe whole time series and try to extract some valued features from the global view, such as volume,stretched length, DFT -based features and so on, we call these features as global features.

We totally propose 16 features, including 5 local features and 11 global features. For eachfeature, we give the detailed algorithm specification, experimental evaluations based on severalclassifiers and performance criteria and the explanation for the performance based on domainknowledge. Finally, we combine the single-value features together and evaluate their performance.


CHAPTER 7. FEATURES

7.1 Local Features

In this section, several local features are proposed. We first present the OPJ -related features,such as OPJ Value, OPJ Position and OPJ-aligned feature, then try the feature Maximum Value,finally propose the feature OPJ Value / Maximum Value.

For each feature, We first present the physiological meaning, then the detailed evaluationsincluding classifications Accuracy, AUC, MCC, T-test and so on are conducted for each feature.Finally, we draw the conclusion based on the experimental evaluation.


CHAPTER 7. FEATURES

7.1.1 OPJ Value

As shown in Figure 7.1,the oropharyngeal junction point (OPJ) is the first local minimal point afterthe peak. It is corresponding to the point with minimum cross-sectional area in the oropharynx.According to the domain knowledge, the oropharynx part is the most possible region that isobstructed in OSA patients and OPJ is the minimal point in this region, hence this point is aprobably a valued feature to discriminate OSA and non-OSA.

[2] concludes that the oropharyngeal junction area (OPJ) of the supine position is the mostpredictive parameter to discriminate OSA and non-OSA. Its experiments show that the positivepredictive value is 98% (50 of 54) and the negative predictive value is 79% (15 of 19). We extractthe OPJ points for all the time series and evaluate the performance.

Figure 7.1: OPJ point in the time series.

Experimental Evaluation

Table 7.1 shows the classification results of feature OPJ Value based on several different classifierswith 10-fold cross validation. We can see that in many cases, most of the subjects are classifiedas OSA subjects, which means the recall is close to 100% but the specificity is near 0. Table 7.1also shows the AUC and MCC for each classifier for each dimension. Most of the AUC in theresult is below 0.600, and the MCC is around 0.000, which indicate that the predictions with theseclassifiers are equivalent to the random prediction.

Table 7.2 shows the statistical results of feature OPJ Value. We can see all the T-test return0 which means there is no significant difference between OSA subjects and non-OSA subjects forfeature OPJ Value. Nevertheless, the p-value of T-test are relatively small, the mean value ofnon-OSA subjects are all larger than the OSA subjects, which is consistent with the theoreticalconclusion. Since the OPJ value of non-OSA subjects should be smaller than that of OSA subjectstheoretically due to the obstruction in the OPJ point of the upper airway in OSA subjects.

Conclusion

From the evaluation results based on our datasets, we can conclude that OPJ Value is not avalued feature, which is inconsistent with the conclusion of [2]. The T-test result shows thatthere is no significant difference between OPJ Value of OSA and non-OSA subjects. However,the experimental result that mean OPJ Value of non-OSA subjects are all larger than that of theOSA subjects is consistent with the domain knowledge w.r.t. OSA.


CHAPTER 7. FEATURES


Linear

Accuracy 60.2% 60.2% 60.2% 59.2%AUC 0.510 0.510 0.515 0.493MCC 0.036 0.036 0.047 -0.028

TP FP 58 36 58 36 57 35 59 38FN TN 5 4 5 4 6 5 4 2

SVM

Accuracy 60.2% 58.3% 61.2% 62.1%AUC 0.492 0.485 0.500 0.549MCC -0.079 0.000 0.000 0.129

TP FP 62 40 58 38 63 40 55 31FN TN 1 0 5 2 0 0 8 9

Logistic

Accuracy 59.2% 60.2% 59.2% 63.1%AUC 0.588 0.572 0.584 0.559MCC 0.021 0.036 0.021 0.142

TP FP 56 35 58 36 56 35 59 34FN TN 7 5 5 4 7 5 4 6

Decision Tree

Accuracy 61.2% 61.2% 61.2% 55.3%AUC 0.483 0.483 0.483 0.415MCC 0.000 0.000 0.000 -0.127

TP FP 63 40 63 40 63 40 55 38FN TN 0 0 0 0 0 0 8 2

1-NN

Accuracy 54.4% 45.6% 58.3% 45.6%AUC 0.548 0.470 0.493 0.399MCC 0.017 0.117 0.130 -0.144

TP FP 42 21 30 23 42 22 35 28FN TN 26 14 33 17 21 18 28 12

Table 7.1: Classification results of feature OPJ Value based on several different classifiers with10-folds cross validation.

non-OSA mean value OSA mean value T-test h value T-test p valueSupine breath 1.6182 1.4648 0 0.0671Supine hold 1.3288 1.1503 0 0.1234

Sitting breath 2.0521 1.8261 0 0.0780Sitting hold 1.5896 1.3773 0 0.1334

Table 7.2: Statistical results of feature OPJ Value.

Since the size of our datasets is relatively small and noise inevitably exists in the datasets, thereis still possibility that OPJ Value is a good feature which cannot be verified by our datasets.


CHAPTER 7. FEATURES

7.1.2 OPJ Position

In addition to the OPJ Value, we also investigate the OPJ Position to check whether it is a valuedfeature. Here OPJ Position means the x-axis value (the distance to the teeth) of the OPJ point.


Table 7.3 shows the classification results of feature OPJ Position based on several different clas-sifiers with 10-fold cross validation. Similar with the evaluation results of feature OPJ Value, inmost cases, most of the subjects are classified as OSA subjects, which means the recall is near100% but the specificity is close to 0. Table 7.3 also shows the AUC and MCC for each classifierfor each dimension. Most of the AUC in the result is below 0.600, and the MCC is around 0.000,which indicate that the predictions with these classifiers are no better than the random guess.

Table 7.4 shows the statistical results of feature OPJ Position. We can see all the T-test return0 which means there is no significant difference between OSA subjects and non-OSA subjects forfeature OPJ Position.


Linear

Accuracy 48.5% 52.4% 54.4% 51.5%AUC 0.484 0.506 0.499 0.503MCC -0.032 0.012 -0.002 0.005

TP FP 31 21 37 23 44 28 35 22FN TN 32 19 26 17 19 12 28 18

SVM

Accuracy 60.2% 58.3% 45.6% 49.5%AUC 0.529 0.517 0.378 0.423MCC 0.075 0.042 -0.314 -0.190

TP FP 54 32 51 31 46 39 47 36FN TN 9 8 12 9 17 1 16 4

Logistic

Accuracy 61.2% 58.3% 60.2% 57.3%AUC 0.428 0.516 0.504 0.580MCC 0.000 -0.087 -0.079 -0.082

TP FP 63 40 59 39 62 40 57 38FN TN 0 0 4 1 1 0 6 2

Decision Tree

Accuracy 61.2% 56.3% 58.3% 58.3%AUC 0.483 0.457 0.438 0.466MCC 0.000 -0.136 -0.138 -0.057

TP FP 63 40 57 39 61 40 58 38FN TN 0 0 6 1 2 0 5 2

1-NN

Accuracy 61.2% 62.1% 43.7% 47.6%AUC 0.509 0.509 0.479 0.453MCC 0.119 0.159 -0.341 -0.221

TP FP 52 29 50 26 44 39 45 36FN TN 11 11 13 14 19 1 18 4

Table 7.3: Classification results of feature OPJ Position based on several different classifiers with10-folds cross validation.

Conclusion

From the evaluation results based on our datasets, we can conclude that OPJ Position is nota valued feature. The T-test result shows that there is no significant difference between OPJPosition of OSA and non-OSA subjects.


CHAPTER 7. FEATURES



Table 7.4: Statistical results of feature OPJ Position.

7.1.3 OPJ-aligned Feature

Algorithm

The feature OPJ Value fails to discriminate the OSA and non-OSA. We want to investigatewhether the multiple points around OPJ point could be a good feature. It is equivalent to alignthe time series first by OPJ point, then compare the feature consists of several points whosemidpoint is OPJ point. The algorithm consists of two steps:

• Align the time series and extract features with k steps. // k is a parameter which can bespecified in advance.For each time series, extract points from Popj − k to Popj + k. // Popj is the OPJ position(index) in the time series.

• Apply several classification algorithms to evaluate the performance.


Table 7.5 shows the experimental evaluation result for OPJ-aligned feature with several classifierswith 10-fold cross validation. Here, k is set to be 3.

From the result, we can see the performance is not good and there is no obvious improvementthan the result of OPJ Value shown in Table 7.1.

Table 7.6 shows the T-test result of OPJ-aligned feature. We can see that almost returnedh− value are 0, which indicates that there is no significant difference between OSA and non-OSAw.r.t. OPJ-aligned feature.

Conclusion

From the evaluation result, we can conclude that OPJ-aligned feature is not a valued feature,which is also verified by T-test result.


CHAPTER 7. FEATURES


Linear

Accuracy 63.1% 60.2% 61.2% 64.1%AUC 0.539 0.510 0.546 0.560MCC 0.141 0.036 0.112 0.177

TP FP 60 35 58 36 53 30 58 32FN TN 3 5 5 4 6 10 5 8

SVM

Accuracy 60.2% 58.3% 61.2% 68.9%AUC 0.492 0.485 0.500 0.627MCC -0.079 -0.057 0.000 0.314

TP FP 62 40 58 38 63 40 57 26FN TN 1 0 5 2 0 0 6 14

Logistic

Accuracy 58.3% 54.4% 59.2% 60.2%AUC 0.555 0.527 0.567 0.616MCC 0.069 -0.033 0.071 0.099

TP FP 48 28 47 31 51 30 51 29FN TN 15 12 16 9 12 10 12 11

Decision Tree

Accuracy 61.2% 56.3% 54.4% 67.0%AUC 0.483 0.411 0.400 0.530MCC 0.000 -0.180 -0.147 0.269

TP FP 63 40 58 40 54 38 60 31FN TN 0 0 5 0 9 2 3 9

1-NN

Accuracy 47.6% 50.5% 59.2% 55.3%AUC 0.424 0.490 0.535 0.469MCC -0.075 -0.019 0.111 0.025

TP FP 33 24 35 23 46 25 44 27FN TN 30 16 28 17 17 15 19 13

Table 7.5: Classification results of OPJ-aligned Feature based on several different classifiers with10-fold cross validation, k is set 3.

Popj − 3 Popj − 2 Popj − 1 Popj Popj + 1 Popj + 2 Popj + 3Supine breath 0 0 0 0 0 0 0Supine hold 0 0 0 0 0 0 0

Sitting breath 1 1 0 0 0 0 1Sitting hold 1 0 0 0 0 0 0

Table 7.6: h-value of T-test of OPJ-aligned feature.

7.1.4 Maximum Value

In this part, we investigate the maximum value (peak) in the time series. The maximum pointin the time series is corresponding to oral cavity in the upper airway. According to the domainknowledge w.r.t. OSA, OSA has little affect impact on the oral cavity, hence this point should nota good feature theoretically. We conduct experiments to verify this conjecture.


Table 7.7 shows the classification results of feature Maximum Value based on several differentclassifiers with 10-fold cross validation. We can see that the accuracy in all cases are below 65%.Table 7.7 also shows the AUC and MCC for each classifier for each dimension. Most of the AUCin the result is below 0.600, and the MCC is around 0.000, which indicate that the predictionswith these classifiers are similar as the random prediction.

Table 7.8 shows the statistical results of feature Maximum Value. We can see all the T-test return 0 which means there is no significant difference between OSA subjects and non-OSA


CHAPTER 7. FEATURES

subjects for feature Maximum Value. Nevertheless, the mean value of non-OSA subjects are alllarger than the OSA subjects, which indicates that there is still some obstruction generated byOSA in the oral cavity, however, this obstruction is too weak to make Maximum Value to be agood feature.


Linear

Accuracy 61.2% 60.2% 60.2% 61.2%AUC 0.500 0.492 0.501 0.50MCC 0.000 -0.079 0.005 0.000

TP FP 63 40 62 40 60 38 63 40FN TN 0 0 1 0 3 2 0 0

SVM

Accuracy 58.3% 57.3% 59.2% 56.3%AUC 0.476 0.482 0.484 0.460MCC -0.138 -0.059 -0.112 -0.180

TP FP 60 40 56 37 61 40 58 40FN TN 3 0 7 3 2 0 5 0

Logistic

Accuracy 61.2% 58.3% 58.3% 60.2%AUC 0.401 0.468 0.526 0.398MCC 0.000 -0.138 -0.035 -0.079

TP FP 63 40 60 40 57 37 62 40FN TN 0 0 3 0 6 3 1 0

Decision Tree

Accuracy 61.2% 61.2% 61.2% 61.2%AUC 0.483 0.483 0.483 0.483MCC 0.000 0.000 0.000 0.000

TP FP 63 40 63 40 63 40 63 40FN TN 0 0 0 0 0 0 0 0

1-NN

Accuracy 41.7% 55.3% 53.4% 46.6%AUC 0.402 0.553 0.485 0.446MCC -0.195 0.034 0.019 -0.119

TP FP 30 27 43 26 39 24 35 27FN TN 33 13 20 14 24 16 28 13

Table 7.7: Classification results of feature Maximum Value based on several different classifierswith 10-folds cross validation.



Table 7.8: Statistical results of feature Maximum Value.

Conclusion

From the evaluation results based on our datasets, we can conclude that Maximum Point is nota valued feature. The T-test result shows that there is no significant difference between OSAand non-OSA subjects for feature Maximum Value. This conclusion is consistent with our initialconjecture.


CHAPTER 7. FEATURES

7.1.5 OPJ Value/Maximum Value

According to the domain knowledge, the oropharynx in the upper airway is the most possiblyobstructed part by OSA, while the oral cavity is not prone to be affected. In order to avoid theinfluence by the difference of body size, we propose the feature OPJ Value / Maximum Value,which is equivalent to the y-scale normalized OPJ Value.


Table 7.9 shows the classification results of feature OPJ Value / Maximum Value based on severaldifferent classifiers with 10-fold cross validation. We can see that in most cases, all the subjects areclassified as OSA subjects, which means the recall is nearly 100% but the specificity is close to 0.Table 7.9 also shows the AUC and MCC for each classifier for each dimension. Most of the AUCin the result is below 0.600, and the MCC is around 0.000, which indicate that the predictionswith these classifiers are equivalent to the random prediction.

Table 7.10 shows the statistical results of feature OPJ Value / Maximum Value. We can seeall the T-test return 0 which means there is no significant difference between OSA subjects andnon-OSA subjects for feature OPJ Value / Maximum Value. The mean value of non-OSA subjectsare all larger than the OSA subjects, which is consistent with the theoretical conclusion that theOPJ value of non-OSA subjects should be smaller than that of OSA subjects theoretically due tothe obstruction in the OPJ part of the upper airway.


Linear

Accuracy 60.2% 60.2% 61.2% 60.2%AUC 0.492 0.497 0.500 0.492MCC -0.079 -0.020 0.000 -0.079

TP FP 62 40 61 39 63 40 62 40FN TN 1 0 2 1 0 0 1 0

SVM

Accuracy 61.2% 61.2% 61.2% 61.2%AUC 0.500 0.500 0.500 0.5000MCC 0.000 0.000 0.000 0.000

TP FP 63 40 62 40 63 40 63 40FN TN 0 0 1 0 0 0 0 0

Logistic

Accuracy 61.2% 59.2% 61.2% 56.3%AUC 0.557 0.527 0.397 0.577MCC 0.057 -0.057 0.000 -0.180

TP FP 60 37 60 39 63 40 58 40FN TN 3 3 3 1 0 0 5 0

Decision Tree

Accuracy 61.2% 61.2% 61.2% 66.0%AUC 0.483 0.483 0.483 0.619MCC 0.000 0.000 0.000 0.269

TP FP 63 40 63 40 63 40 48 20FN TN 0 0 0 0 0 0 15 20

1-NN

Accuracy 44.7% 45.6% 52.4% 64.1%AUC 0.423 0.427 0.514 0.597MCC -0.140 -0.115 0.021 0.234

TP FP 32 26 32 25 36 22 46 20FN TN 31 14 31 15 27 18 17 20

Table 7.9: Classification results of feature OPJ Value/Maximum Value based on several differentclassifiers with 10-fold cross validation.


CHAPTER 7. FEATURES



Table 7.10: Statistical results of feature OPJ/Maximum.

Conclusion

From the evaluation results based on our datasets, we can conclude that OPJ Value / Maxi-mum Value is not a valued feature. The T-test result shows that there is no significant differencebetween OSA and non-OSA subjects for feature OPJ Value / Maximum Value.

Comparison with OPJ ValueFrom the T-test result, we can find that p-value of feature OPJ Value / Maximum Value is

much higher than the p-value of feature OPJ Value, which is shown in Table 7.2. This resultillustrates that the y-scale normalization decreases the performance of the feature OPJ Value,which is inconsistent with what we have expected.


CHAPTER 7. FEATURES

7.2 Global Features

In this section, several global features are proposed. We first present the physiological meaningand algorithm specification for each feature, then the detailed evaluations including Accuracy,T-Test, AUC and so on are conducted for each feature.


CHAPTER 7. FEATURES

7.2.1 Shapelets based Algorithms

Algorithm

Time series shapelets[16] are small, local subsequences in a time series that are highly predictiveof a class and are thus very useful features for building classifiers. [16] first computes Shapeletsand its corresponding split point which can be considered as a classification feature and adopts theInformation Gain and OSP(optimal Split Point) based classifier to classify the time series objects.

For a given subsequence as the candidate for the desired shapelets, [16] first calcuates theeuclidean distance between each time series in the test set to the candidate, then uses InformationGain to calculate the Optimal Split Point for the distances, and finally calculate the accuracy bythis candidate. [16] proposes two pruning methods to improve the efficiency of the algorithm.

In some cases, shapelets performs much better than the popular nearest neighbor algorithmand is proved to be a good time series feature. We apply this technique to our datasets and checkthe performance.


Table 7.11 shows the evaluation result with the source code provided by [16]. In addition, weapply the voting algorithm to ensemble the result of four dimensions.

From Table 7.11, we can see that the accuracy is between 40% to 56%, which is no differentfrom the random guess. Using voting ensemble method, the accuracy could be increased to 58%,which is still low.

Decision Tree accuracy Voting EnsembleSupine breath 50%

58%Supine hold 55.7%


Table 7.11: Accuracy with feature Shapelets with 3-folds cross validation.

Conclusion

From the evaluation results, we can conclude that though the time series feature shapelets is avalued feature in some cases illustrated in [16], it performs poor on our datasets.

Failure AnalysisOne possible reason is that shapelets stresses the shape difference between the tested time series

and the shapelets candidate, which is not effective in our datasets. There is no big shape differencebetween one time series with another one in our datasets. The difference between non-OSA andOSA lies in the cross-sectional area decrease resulted by the obstruction of upper airway, whichhas little correlation to the shapelets.


CHAPTER 7. FEATURES

7.2.2 Using SAX to transform the time series into low-dimensional sym-bolic datasets

SAX [9] is a symbolic representation for time series that allows for dimensionality reduction.The representation allows researchers to avail of the wealth of data structures and algorithmsin bioinformatics or text mining, and also provides solutions to many challenges associated withcurrent data mining tasks.

We applied SAX to our datasets to transform the datasets into low-dimension datasets, thenuse the linear classifiers to separate the datasets.


Table 7.12 shows accuracy with different popular classification algorithms on the symbolic sequencetransformed by SAX. In this example, the 45 points are transformed into 9 dimensions and thealphabet size is 5. From the result, we could find that the classifiers performs not well on thesymbolic sequence transformed by SAX technique.

Supine breath Supine hold Sitting breath Sitting holdLinear 55.3% 57.3% 65.0% 66.0%SVM 64.1% 58.3% 66.0% 60.2%

Logistic 55.3% 54.4% 67% 65%Decision tree 58.3% 65.0% 66.0% 61.2%

Table 7.12: Accuracy with different popular classification algorithms on the symbolic sequencetransformed by SAX. In this example, the 45 points are transformed into 9 dimensions and thealphabet size is 5.

In addition to the choice that using the low-dimensional symbolic sequence to be the inputfeatures, the other choice is to calculate the histogram vector of the symbolic sequence as theinput feature, e.g., if the symbolic sequence for one time series is abeacdeba, then the histogramvector for it is 32112 (3 a, 2 b...), which is the appearing number in the sequence for each symbol.Here assume the dimension of symbolic sequence is 9 and the alphabet size is 5.

Table 7.13 shows the accuracy with different linear classification algorithms, taking the his-togram vector of the symbolic sequence as the feature. the result is not good. However, this resultcould be a baseline to be compared with.

Supine breath Supine hold Sitting breath Sitting holdLinear 60.2% 56.3% 61.2% 59.2%SVM 63.1% 52.4% 60.2% 61.2%

Logistic 60.2% 57.3% 60.2% 58.3%Decision tree 61.2% 57.3% 66.0% 55.3%

Table 7.13: Accuracy with different popular classification algorithms, taking the histogram vectorof the symbolic sequence as the feature.

Conclusion

From the evaluation result, we can find that SAX algorithm performs not well on our datasets.However, it provides us a good idea of transforming the time series into symbolic sequence, thenwe could further apply the popular algorithms about symbolic sequence, which can be future workof this project.


CHAPTER 7. FEATURES

7.2.3 Volume

From domain knowledge, it is known that OSA subjects’ upper airway is obstructed in some placeand hence the volume in some place are more narrow than the non-OSA subjects. Since thedatasets represent the cross-sectional area of upper airway over distance from teeth. Hence thevolume of upper airway could be calculated by the integration of the cross-sectional area alongthe distance to teeth, which is actually the integration of time series along x-axis.

We evaluate the volume of all the possible interval in the whole time series and find the mostdiscriminative interval.

Algorithm Specification

In order to avoid the overfitting to the dataset, we apply the cross-validation technique to extractthe most discriminative interval of volume. As shown in Algorithm 4, we first calculate theaccuracy for the volume in each possible interval, then sum up the accuracy of the same intervalin K different training set and test set. Finally, the interval with the biggest total accuracy isconsidered as the most discriminative interval.

Algorithm 4 Volume-Extraction-Algorithm(D, K)

1: Input: dataset D, K //K-folds cross-validation2: Output: the most discriminative interval [start, end]3: //initialization4: for i = 1→ LengthOfT imeSeries do5: for j = i→ LengthOfT imeSeries do6: total accuracy[i][j]← 07: end for8: end for9: //Get the total accuracy from K-folds cross-validation

10: for a = 1← K do11: for i = 1→ LengthOfT imeSeries do12: for j = i→ LengthOfT imeSeries do13: volume← CalculateV olume(i, j)14: accuracy ← Feature-Evaluation-Algorithm(volume,Dtrain[a], Dtest[a]) // use Algo-

rithm 115: total accuracy[i][j]← total accuracy + accuracy16: end for17: end for18: end for19: //Get the interval with the biggest total accuracy20: best accuracy ← 021: start← 1; end← 122: for i = 1→ LengthOfT imeSeries do23: for j = i→ LengthOfT imeSeries do24: if total accuracy[i][j] > best accuracy then25: best accuracy ← total accuracy[i][j]26: start← i; end← j27: end if28: end for29: end for30: Return [start, end]


CHAPTER 7. FEATURES


Table 7.14 shows the best interval we extract from Algorithm 4 for four dimensions. Except thedimension supine breath, the other three dimensions are all near the OPJ points, which is alsoindicated in the Figure 7.3. Table 7.14 also shows the accuracy of four dimensions using DecisionTree classifier. which are all above 70%. In addition, we also use Voting Algorithm to ensemblethe result of four dimensions, whose accuracy can reach 77%. By contrast, we also evaluate theperformance of volume of whole time series, as shown in Table 7.15, from which, it is obvious thatthe interval selection improves the performance a lot.

Table 7.16 shows the detailed information of classification result using Decision Tree classifier.Table 7.17 shows the statistical results of four dimensions for the feature Volume, including

the mean value and the T-test. We can find that all T-test h-value are 1 which indicate thatVolume for non-OSA and OSA has significant difference from each other. This result shows thatthe feature is relatively a good feature from the view of statistics compared to other features wehave got.

Figure 7.2 shows the ROC curves of the feature Volume for four dimensions with Decision TreeClassifier. Also the AUC (area under ROC line) is shown on the up-left corner of each figure.

Best interval Decision Tree accuracy Voting EnsembleSupine breath 33− 44 71.8%

77%Supine hold 25− 34 74.8%

Sitting breath 15− 32 71.8%Sitting hold 16− 32 74.8%

Table 7.14: Accuracy for feature Volume. The total length of interval is 45.



Table 7.15: Accuracy for feature Volume without interval selection, i.e., volume of whole timeseries data.

TP FP FN TN Weighted Precision Weighted Recall AUCSupine breath 58 24 5 16 0.729 0.718 0.597Supine hold 57 20 6 20 0.752 0.748 0.638

Sitting breath 51 17 12 23 0.714 0.718 0.607Sitting hold 59 22 4 18 0.763 0.748 0.626

Table 7.16: Classification results of feature Volume based on Decision Tree model.



Table 7.17: Statistical results of feature Volume.


CHAPTER 7. FEATURES

(a) ROC curve of result with supine breath. (b) ROC curve of result with supine hold.

(c) ROC curve of result with sitting breath. (d) ROC curve of result with sitting hold.

Figure 7.2: ROC curve of results with four dimensions.

(a) Best interval for Volume with supine breath. (b) Best interval for Volume with supine hold.

(c) Best interval for Volume with sitting breath. (d) Best interval for Volume with sitting hold.

Figure 7.3: Best interval for Volume with four dimensions, which are indicated by blue lines.

Explanation based on Domain Knowledge

From domain knowledge, we know that one or more sites within the oral pharayngeal region areusually where closure occurs in most subjects with OSA. Although the retropalatal region of theoropharynx is the most common site of collapse (see Figure 2.2), airway narrowing is a dynamicprocess, varying markedly among and within subjects and often includes the retroglossal andhypopharyngeal areas. Our experimental result is consistent with this medical finding. FromFigure 7.3, we find that all the most discriminative volume interval of four dimensions are inthe region from reptropalatal region and retroglossal region to the hypopharyngeal area. Exceptfor the selected volume interval of sitting breath, all the other three volume interval are near theOPJ point, and mainly in the oropharynx region(including reptropalatal region and retroglossalregion).

As shown from some local features, due to the some uncertain factors such as noise or phys-iological difference between different subjects, it is hard to well discriminate OSA and non-OSAthrough one single point in the upper airway. However, since the upper airway of OSA subjectsis always obstructed in some place compared to non-OSA subjects, the volume, which is the


CHAPTER 7. FEATURES

integration of some continuous points in a interval could perform better than a single point todiscriminative OSA subjects and non-OSA subjects.

Conclusion

As shown from the experimental results, the feature Volume is actually a good feature comparedto other features we find, especially the local features. T-test result indicates that Volume of OSAsubjects and non-OSA subjects is significantly different from each other.

The highest accuracy for a single dimension could reach 75%, using Voting Ensemble algorithmof four dimensions, the accuracy could reach 77%.


CHAPTER 7. FEATURES

7.2.4 Volume Variance

There are 4 dimensions for each subject in the datasets. As we know, the volume between sittingand supine is different because of gravity impact. In this section, we want to investigate whetherthe volume variance between sitting and supine is a good feature. The physiological meaning ofthis feature is the volume variance in the upper airway for a subject at different postures.

Specifically, there are two pairs to compare the volume variance:

• Comparison between Sitting Breath and Supine Breath.

• Comparison between Sitting Hold and Supine Hold.

Similar to the algorithm of feature Volume, we test all the possible interval and select the mostdiscriminative one.


Table 7.18 shows the evaluation result of feature Volume Variance. We can find that the result isrelatively good for Sitting Breath vs Supine Breath. Hence, the volume variance between SittingBreath and Supine Breath could be a good feature. We don’t consider the volume variance betweenSitting Hold vs Supine Hold since its performance is poor.

Though the h-value of T-test result for Sitting Breath vs Supine Breath is 0, the p-value is0.1106, which is not big. the mean value of non-OSA is larger than the OSA, which indicates thatthe volume variance between different postures of non-OSA subjects is bigger than the volumevariance of OSA subjects.

Figure 7.4 shows the selected interval of the feature Volume Variance. We can see that theinterval is from the oral cavity to the Oropharynx region.

Sitting breath vs Supine breath Sitting hold vs Supine holdAccuracy with Decision Tree 71.8% 61.2%

selected interval 1− 29 21-27AUC 0.719 0.483MCC 0.400 0.000

TP FP 50 16 58 36FN TN 13 24 5 4

non-OSA mean 23.9 3.06OSA mean 18.2 1.93

h-value of T-test 0 0p-value of T-test 0.1106 0.26

Table 7.18: Evaluation result of feature Volume Variance.

Figure 7.4: The selected interval of Volume Variance, which is indicated by blue lines.


CHAPTER 7. FEATURES

Conclusion

From the evaluation result, we can conclude that the Volume Variance between Sitting Breathand Supine Breath is a good feature. The mean value of non-OSA is larger than the OSA, whichindicates that the volume variance between different postures of non-OSA subjects are bigger thanthe volume variance of OSA.

Explanation from Domain KnowledgeOne possible explanation of the result that the volume variance between Sitting Breath and

Supine Breath of non-OSA subjects are bigger than the volume variance of OSA is: according tothe domain knowledge, the muscle relax is one key factor that leads to OSA, hence, the flexibilityof the muscle in the upper airway of OSA subjects is lower than that of non-OSA subjects. Inaddition, the enlargement of soft tissue in OSA subjects also contributes to the observation. Thus,the volume difference between sitting and supine w.r.t. OSA subjects is smaller than non-OSA.


CHAPTER 7. FEATURES

7.2.5 OPJ-aligned Volume

Since there is more or less physiological difference between different persons, there is some shiftingbetween the time series from different subjects. We conjecture that if we first align the time series,then apply the feature Volume, maybe we could get better result than non-aligned Volume. In thissection, we first align the time series with OPJ point, then check volume in any possible intervaland try to get the most discriminative one, just same as the process in feature Volume, we callthis feature OPJ-aligned Volume.


Table 7.19 shows the accuracy with Decision tree classifier. Compared to the result of Volumeshown in Table 7.14, the accuracy is unexpectedly decreased a little.



Table 7.19: Accuracy with feature OPJ-aligned Volume.

Conclusion

From the evaluation result, we can see that the accuracy of OPJ-aligned Volume is worse than thefeature Volume, which is inconsistent with what we expected.

ExplanationOne possible reason for the failure is that OPJ is not a valued feature with respect to our

datasets, hence there is no improvement to align time series with OPJ.


CHAPTER 7. FEATURES

7.2.6 Stretched Length

The stretched length of a time series, which is a common-used time series feature, can be consideredas the length after being stretched to be a straight line. This measurement could be used to measurethe degree of fluctuation of a time series.

Like the feature Volume, we evaluate the Stretched Length of all the possible interval in thewhole time series and find the most discriminative interval.

Algorithm Specification

The process of algorithm of finding the most discriminative interval for feature Stretched Lengthis similar to the process for the feature Volume.

In order to avoid the overfitting to the dataset, we apply the cross-validation technique toextract the most discriminative interval for Stretched Length. As shown in Algorithm 5, we firstcalculate the accuracy for the stretched length in each possible interval, then sum up the accuracyof the same interval in K different training set and test set. Finally, the interval with the biggesttotal accuracy is considered as the most discriminative interval for feature Stretched Length.

Algorithm 5 Stretched-Length-Extraction-Algorithm(D, K)

1: Input: dataset D, K //K-folds cross-validation2: Output: the most discriminative interval [start, end]3: //initialization4: for i = 1→ LengthOfT imeSeries do5: for j = i→ LengthOfT imeSeries do6: total accuracy[i][j]← 07: end for8: end for9: //Get the total accuracy from K-folds cross-validation

10: for a = 1← K do11: for i = 1→ LengthOfT imeSeries do12: for j = i→ LengthOfT imeSeries do13: stretched length← CalculateStretchedLength(D, i, j)14: accuracy ← Feature-Evaluation-Algorithm(stretched length,Dtrain[a], Dtest[a])15: total accuracy[i][j]← total accuracy + accuracy16: end for17: end for18: end for19: //Get the interval with the biggest total accuracy20: best accuracy ← 021: start← 1; end← 122: for i = 1→ LengthOfT imeSeries do23: for j = i→ LengthOfT imeSeries do24: if total accuracy[i][j] > best accuracy then25: best accuracy ← total accuracy[i][j]26: start← i; end← j27: end if28: end for29: end for30: Return [start, end]


CHAPTER 7. FEATURES

Algorithm 6 CalculateStretchedLength(D, start, end)

1: Input: dataset D, start, end //start and end indicate the considered interval.2: Output: the stretched length in interval [start, end] for each time series in D3: //initialization4: for i = 1→ size(D) do5: T ← Di

6: stretched length(i)← 07: for k = start→ end− 1 do8: length ←

√(Tk+1 − Tk)2 + (Xstep)2 // Tk indicates the value of kth point in the time

series T , Xstep is the length of (XK+1−Xk), we could specify this value in the experiment.

9: stretched length(i)+ = length10: end for11: end for12: Return stretched length


Table 7.20 shows the best interval we extract from Algorithm 5 for four dimensions. We can findthat all the four dimensions are all belong to the oropharynx region, which is also indicated inthe Figure 7.6. This is consistent with the medical finding about OSA symptom. Table 7.20 alsoshows the accuracy of four dimensions using Decision Tree classifier, which are all near 70%. Bycontrast, we also evaluate the performance of stretched length of whole time series, as shown inTable 7.21, from which, it is obvious that the interval selection improves the performance a lot.

Table 7.22 shows the detailed information of classification result using Decision Tree classifier.

Table 7.23 shows the statistical results of four dimensions for the feature Stretched Length,including the mean value and the T-test. We can find that all T-test h-value are 1 which indicatethat Stretched Length for non-OSA and OSA has significant difference from each other. This resultshows that the feature is relatively a good feature from the view of statistics compared to otherfeatures we have got.

Figure 7.5 shows the ROC curves of the feature Stretched Length for four dimensions withDecision Tree Classifier. Also the AUC (area under ROC line) is shown on the up-left corner ofeach figure.

Best interval Decision Tree accuracySupine breath 19− 31 71.8%Supine hold 20− 32 70.9%

Sitting breath 18− 30 69.9%Sitting hold 19− 30 69.9%

Table 7.20: Accuracy with feature Stretched Length. The total length of interval is 45.



Table 7.21: Accuracy for feature Stretched Length without interval selection, i.e., stretched lengthof whole time series data.


CHAPTER 7. FEATURES

TP FP FN TN Weighted Precision Weighted Recall AUCSupine breath 59 25 4 15 0.736 0.718 0.581Supine hold 54 21 9 19 0.704 0.709 0.657

Sitting breath 62 30 1 10 0.765 0.699 0.576Sitting hold 60 30 3 10 0.706 0.680 0.534

Table 7.22: classification results of feature Stretched Length based on Decision Tree model.



Table 7.23: Statistical results of feature Stretched Length.

Explanation based on Domain Knowledge

From domain knowledge, we know that one or more sites within the oropharynx region, includingretropalatal region and retroglossal region, are usually where closure occurs in most subjects withOSA. Our experimental result is consistent with this medical finding. From Figure 7.6, we findthat all the most discriminative Stretched Length interval of four dimensions are around the regionfrom reptropalatal region to retroglossal region. All the four selected intervals are near the OPJpoint, and mainly in the oropharynx region.

Stretched Length can be used to measure the degree of fluctuation of a time series. FromTable 7.23, we could find that the mean value of Stretch Length for non-OSA subjects is alwayslarger than OSA subjects, which indicates that the area-distance curves of non-OSA subjects havemore degrees of fluctuation than OSA subjects. This result illustrates that non-OSA subjects’upper airway has more fluctuation than OSA subjects. However, there is no such medical finding.It remains a viewpoint which need to be further verified from the medical view. One possibleexplanation is that since the volume of oropharynx region of non-OSA subjects is always largerthan the OSA subjects as confirmed in Section 7.2.3, the upper airway of non-OSA has more spacefor fluctuation than OSA subjects.

Conclusion

As shown from the experimental results, the feature Stretched Length is relatively a good featurecompared to other features we find, especially the local features. T-test result indicates thatStretch Length of OSA subjects and non-OSA subjects is significantly different from each other.

However, the conclusion obtained by the experimental result that non-OSA subjects’ upperairway has more fluctuation than OSA subjects, is need to further verified from the medical view.


CHAPTER 7. FEATURES

(a) ROC curve of result with supine breath. (b) ROC curve of result with supine hold.

(c) ROC curve of result with sitting breath. (d) ROC curve of result with sitting hold.

Figure 7.5: ROC curve of results with four dimensions.

(a) Best interval with supine breath. (b) Best interval with supine hold.

(c) Best interval with sitting breath. (d) Best interval with sitting hold.

Figure 7.6: Best interval for Stretched Length with four dimensions, which are indicated by bluelines.

7.2.7 Reference-based feature

In this section, instead of extracting features from raw data directly, we attempt to find a refer-enced template of the time series for the non-OSA subjects, then for each time series, considerthe deviation from the referenced template as the feature. Theoretically the deviation from thetemplate for the non-OSA subjects should be always smaller than the deviation for the OSAsubjects.

In practice, we used DTW (Dynamic Time warping) distance as the deviation to the referencemodel, due to its advantages in measuring similaries beween time series.

Algorithm

The general workflow of this algorithm is as follows:

1. Find the reference model (time series) as the template of non-OSA subjects.

2. Calculate the DTW (Dynamic Time warping) distance for each time series to be assessed


CHAPTER 7. FEATURES

to the reference model as the deviation.

3. Use the DTW distance as the input feature, apply several popular classifiers to evaluate theperformance of the feature with cross validation.

Since there is no existing template for non-OSA time series, we have to obtained it by someother approximated way. Two methods are proposed to calculate the reference model.

• Randomly select half of the non-OSA time series and calculate the mean value as the referencemodel(labeled as RefModel1).

• Use the feature Volume to extract top-10 non-OSA time series and then calculate the meanvalue of them as the reference model(labeled as RefModel2).Since the feature Volume performs relatively well on the datasets, we assume that selectedtime series by Volume could better represent the non-OSA characteristics.


Table 7.24 shows the classification results of reference-based feature obtained with RefModel1based on several different classifiers with 10-fold cross validation. Note that in the test set, thereare 20 non-OSA subjects and 63 OSA subjects since 20 non-OSA subjects are used to calculatethe reference model.

We can see that most of the accuracy is 75.9%, which is a relatively good result. However,inthese cases, all the subjects are classified as OSA subjects, which means the recall is 100% but thespecificity is 0. Table 7.24 also shows the AUC and MCC for each classifier for each dimension.Most of the AUC in the result is just 0.500, and the MCC is 0.000, which indicate that thepredictions with these classifiers are equivalent to the random prediction.

Table 7.25 shows the statistical results of reference-based feature based on RefModel1. Wecan see all the T-test return 0 which means there is no significant difference between OSA subjectsand non-OSA subjects for reference-based feature. The mean value of non-OSA subjects sometimesare larger than the OSA subjects, which is inconsistent with the theoretical conclusion. Since thedistance from non-OSA subjects to the reference model should be smaller than the distance fromOSA subjects to the reference model theoretically.

We also have the similar evaluation on the reference-based feature obtained by RefModel2,as shown in Table 7.26 and Table 7.27. Similar to the result based on RefModel1, almost all thesubjects are classified to OSA class. The difference is that the mean value of non-OSA subjectsare all smaller than the OSA subjects, which is consistent with the theoretical result.

Analysis of the Evaluation Result

Evaluation result shows that reference-based features, either calculated by RefModel1 or Ref-Model2, perform bad and fail to classify the OSA and non-OSA subjects. There are three possiblereasons for the failure.

• The reference model may deviate a lot from the actual template model and cannot wellrepresent the characteristics of the non-OSA subjects. The reason lies in the fact that thesize of the dataset are too small and hence the size of the candidates used for the calculationof reference model is too small.

• The size of the datasets are too small from the view of statistics and classification. Theclassifiers cannot be trained well and hence the prediction result is poor.


CHAPTER 7. FEATURES

Supine breath Supine hold Sitting breath Sitting hold

Linear

Accuracy 75.9% 75.9% 75.9% 75.9%AUC 0.500 0.500 0.500 0.50MCC 0.000 0.000 0.000 0.000

TP FP 63 20 63 20 63 20 63 20FN TN 0 0 0 0 0 0 0 0

SVM

Accuracy 75.9% 75.9% 75.9% 75.9%AUC 0.500 0.500 0.500 0.50MCC 0.000 0.000 0.000 0.000

TP FP 63 20 63 20 63 20 63 20FN TN 0 0 0 0 0 0 0 0

Logistic

Accuracy 75.9% 74.7% 74.7% 75.9%AUC 0.500 0.541 0.564 0.50MCC 0.000 -0.062 -0.062 0.000

TP FP 63 20 62 20 62 20 63 20FN TN 0 0 1 0 1 0 0 0

Decision Tree

Accuracy 75.9% 75.9% 75.9% 75.9%AUC 0.500 0.500 0.500 0.50MCC 0.000 0.000 0.000 0.000

TP FP 63 20 63 20 63 20 63 20FN TN 0 0 0 0 0 0 0 0

1-NN

Accuracy 47.0% 71.1% 59.0% 60.2%AUC 0.353 0.577 0.477 0.433MCC -0.283 0.182 -0.049 -0.146

TP FP 37 18 52 13 44 15 48 18FN TN 26 2 11 7 19 5 15 2

Table 7.24: Classification results with RefModel1 based on several different classifiers with 10-folds cross validation. Note that in the test set, there are 20 non-OSA subjects and 63 OSAsubjects since 20 non-OSA subjects are used to calculate the reference model.

• The quality of the dataset may be poor due to the measurements of the data and the noisein the datasets.

Comparison of RefModel1 and RefModel2From the Table 7.25 and Table 7.27, we can see that w.r.t. RefModel1, the mean value of

non-OSA is sometimes larger than OSA which is inconsistent with the theoretical result. On thecontrast, the experiment result with reference feature obtained by RefModel2 is consistent withthe theoretical result, i.e., the mean value of non-OSA subjects is always smaller than OSA, whichcould illustrate that the reference model obtained by Volume is more closer the true template ofnon-OSA and indirectly demonstrate the effectiveness of feature Volume.

Conclusion

The evaluation result indicates that the reference-based feature fails to classify the OSA and non-OSA subjects, for the possible reasons explained in the analysis section. However, reference-basedfeature is actually a good idea and may inspire us potentially to extract valued features for thesimilar classification problem.


CHAPTER 7. FEATURES



Table 7.25: Statistical results of reference-based feature based on RefModel1.


Linear

Accuracy 67.7% 67.7% 67.7% 67.7%AUC 0.500 0.500 0.500 0.50MCC 0.000 0.000 0.000 0.000

TP FP 63 30 63 30 63 30 63 30FN TN 0 0 0 0 0 0 0 0

SVM

Accuracy 67.7% 67.7% 67.7% 67.7%AUC 0.500 0.500 0.500 0.50MCC 0.000 0.000 0.000 0.000

TP FP 63 30 63 30 63 30 63 30FN TN 0 0 0 0 0 0 0 0

Logistic

Accuracy 67.7% 67.7% 67.7% 67.7%AUC 0.406 0.350 0.469 0.234MCC 0.000 0.000 0.000 0.000

TP FP 63 30 63 30 63 30 63 30FN TN 0 0 0 0 0 0 0 0

Decision Tree

Accuracy 67.7% 67.7% 67.7% 67.7%AUC 0.500 0.483 0.483 0.483MCC 0.000 0.000 0.000 0.000

TP FP 63 30 63 30 63 30 63 30FN TN 0 0 0 0 0 0 0 0

1-NN

Accuracy 60.2% 54.8% 61.3% 52.69%AUC 0.585 0.447 0.583 0.452MCC 0.065 0.002 0.130 -0.064

TP FP 46 20 40 19 44 17 40 21FN TN 17 10 23 11 19 13 23 9

Table 7.26: Classification results with RefModel2 based on several different classifiers with 10-folds cross validation. Note that in the test set, there are 30 non-OSA subjects and 63 OSAsubjects since 10 non-OSA subjects are used to calculate the reference model.



Table 7.27: Statistical results of reference-based feature based on RefModel2.

7.2.8 Auto-Correlation

Autocorrelation refers to the correlation of a time series with its own past and future values. Itis actually a mathematical representation of the degree of similarity between a given time seriesand a lagged version of itself over successive time intervals.

The result of Autocorrelation ranges from -1 to 1. An autocorrelation of +1 represents perfect


CHAPTER 7. FEATURES

positive correlation (i.e. an increase seen in one time series will lead to a proportionate increasein the same time series lagged by some time units), while a value of -1 represents perfect negativecorrelation (i.e. an increase seen in one time series results in a proportionate decrease in the sametime series lagged by some time units).

Algorithm

The autocorrelation is computed between one time series and the same series lagged by one ormore time units. For the first-order autocorrelation, the lag is one time unit. The first-orderautocorrelation coefficient is the simple correlation coefficient of the first N − 1 observations,xt, t = 1, 2, ..., N −1 and the N −1 observations,xt, t = 2, 3, ..., N . The correlation between xt andxt+1 is given by

r1 =

∑N−1t=1 (xt − ¯x(1))(xt+1 − ¯x(2))

[∑N−1

t=1 (xt − ¯x(1))2]1/2[∑N

t=2 (xt − ¯x(2))2]1/2(7.1)

Where ¯x(1) is the mean of the first N − 1 observations and ¯x(2) is the mean of the last N − 1observations. As the correlation coefficient given by 7.1 measures correlations between successiveobservations, it is called the autocorrelation coefficient or serial correlation coefficient.

Equation 7.1 can be generalized to give the correlation between observations separated by ktime steps:

rk =

∑N−kt=1 (xt − ¯x(1))(xt+k − ¯x(2))

[∑N−k

t=1 (xt − ¯x(1))2]1/2[∑N

t=k+1 (xt − ¯x(2))2]1/2(7.2)

The quantity rk is called the autocorrelation coefficient at lag k.



LinearAccuracy 61.17% 60.17% 61.17% 61.17%

AUC 0.500 0.500 0.500 0.500MCC 0.000 0.000 0.000 0.000

SVMAccuracy 61.17% 61.17% 61.17% 61.17%

AUC 0.500 0.500 0.500 0.500MCC 0.000 0.000 0.000 0.000

LogisticAccuracy 61.17% 62.14% 63.11% 64.08%

AUC 0.500 0.598 0.555 0.594MCC 0.000 0.129 0.149 0.186

Decision TreeAccuracy 60.19% 61.17% 49.51% 54.37%

AUC 0.492 0.502 0.464 0.462MCC -0.020 0.032 -0.053 -0.123

1-Nearest NeighborAccuracy 49.51% 54.37% 60.19% 49.51%

AUC 0.457 0.532 0.468 0.509MCC -0.083 0.119 -0.079 -0.035

Table 7.28: classification results of feature autocorrelation with time step = 1 based on severaldifferent classifiers with 10-fold cross validation.

In the experiments, we first try the first-order autocorrelation coefficient of each time series asthe feature. Then the feature is evaluated on several popular classifiers, as shown in Table 7.28. foreach classifier, Accuracy, AUC and MCC are calculated. We can find that the evaluation resultsare poor, which indicates that the first-order autocorrelation coefficient is not a valued feature.

Then we calculate the autocorrelation coefficients from first-order to 30-orders (k ranges in[1, 30] in Equation 7.2). In this way, the feature for each time series is a vector of autocorrelationcoefficients. Different lengths of vector are tried as features: k ∈ [1, 5], k ∈ [1, 10], k ∈ [1, 30].


CHAPTER 7. FEATURES


LinearAccuracy 61.17% 60.19% 66.02% 60.19%

AUC 0.564 0.551 0.626 0.542MCC 0.140 0.114 0.264 0.099

SVMAccuracy 61.17% 61.17% 61.17% 61.17%

AUC 0.500 0.500 0.500 0.500MCC 0.000 0.000 0.000 0.000

LogisticAccuracy 65.04% 47.57% 50.49% 60.19%

AUC 0.619 0.484 0.521 0.585MCC 0.251 -0.114 -0.038 0.158

Decision TreeAccuracy 53.40% 67.96% 52.43% 54.37%

AUC 0.389 0.644 0.554 0.462MCC -0.194 0.289 -0.035 -0.123

1-Nearest NeighborAccuracy 66.02% 56.31% 56.31% 60.19%

AUC 0.608 0.532 0.579 0.603MCC 0.264 0.119 0.085 0.190

Table 7.29: classification results of feature autocorrelation with time step ranging from 1 to 30based on several different classifiers with 10-folds cross validation.

h = 1 h=0Supine breath {1, 2, 3} {4− 30}Supine hold {1− 10} {11− 30}

Sitting breath {29, 30} {1− 28}Sitting hold {1− 10, 29, 30} {11− 28}

Table 7.30: h-value of T-test of feature autocorrelation with time step ranging from 1 to 30.

However, just like the case k = 1, the results are all not good. Table 7.29 shows the evaluationresult with the case k ∈ [1, 30].

Then we have the T-test on the autocorrelation features, which is shown in Table 7.30. Wecan find that the h − value of the first some columns of autocorrelation coefficients are 1, whichmeans the corresponding column of coefficients has significant difference between non-OSA andOSA subjects. However, the performance of features with k ∈ [1, 5] and k ∈ [1, 10] is poor.

Conclusion

From the evaluation results, we can see that all the criteria of assessments including accuracy,MMC and AUC performs poor on the autocorrelation of our datasets. Hence, autocorrelation-based feature is not a valued feature.


CHAPTER 7. FEATURES

7.2.9 DFT-based and DWT-based features

Discrete Wavelet Transformation (DWT) and the discrete Fourier Transformation (DFT) are twopopular feature extraction techniques for time series. The signal is projected into the frequencydomain (DFT) or a tiling of the time-frequency plane (DWT). In this section, we adopt thealgorithm proposed by [10] to apply DWT and DFT to our datasets, in order to extract goodfeatures from domain space or time-frequency space.

Algorithm

A frequently-used and also efficient way of applying DFT and DWT is just keeping only the firstfew coefficients as features, in this way the time series is represented by a rough sketch, becausethese coefficients correspond to the low frequencies of the signal. But using the first coefficient isnot the best way to compress time series data. Another choice is to use the largest coefficientsinstead, which could preserve the optimal amount of energy present in the original signal [15].However, the disadvantage of this method dealing with several time series lies in the high storagedemands and bookkeeping overhead for the distance computations.

[10] proposes a new method to select the same subset of the coefficients for all the time seriesto achieve the same dimensionality reduction as the traditional method but keeping the distancecomputation simple. Compared to the optimal method of keeping the largest coefficients per timeseries individually, the extracted features are more interpretable. The coefficients are selected byoptimizing the energy preservation so as to make the new extracted features could maximallyclosed to the original data.

As shown in Algorithm 7, we first calculate the coefficients matrix C using DFT or DWT.Then for each index j ∈ M ,we calculate Tj , the sum of square of ci,j , i = 1 → l, which is theimportance measurement for j. Then top-k index set J can be extracted according to Tj , j ∈M .Finally, we return the selected columns of C corresponding to the index in J as the features.

After obtaining the DFT based or DWT based features, we then apply all kinds of classifiersto check the performance of the features.

Algorithm 7 DFT DWT based Feature Extraction(D, l, m)

1: Input: dataset D, l, m // l is the number of time series in D, m is the length of time seriesin D

2: Output: the selected subset C ′ with size k of coefficients set C obtained by DFT or DWT3: Let C = (ci,j) i = 1, ..., l, j = 1, ...,m be the l × m matrix of the coefficients obtained by

DFT(D) or DWT(D)4: for j = 1→ m do5: Tj ←

∑li=1 c

2i,j // Tj is the importance measurement.

6: end for7: Extract top-k index set J from set M = {1, 2, ...,m} satisfying Tj > Tj′ , ∀j ∈ J, j′ ∈M \ J8: Return C ′ = (ci,j) i = 1, ..., l, j ∈ J


Table 7.31 and Table 7.32 respectively show the accuracy of top-8 dimensions DFT-based featuresand DWT-based features with several different classifiers. We can find that the accuracy is lowwith all classifiers.

Then we have the t-test on each column of the features, as shown in Table 7.33 and Table 7.34.From the result, we can see that the p-values of t-test results are almost larger than 0.05, exceptthe first column, which indicate that there is no significant difference between the features ofnon-OSA and OSA. Since the p-values of the first column of features are smaller than 0.05, weevaluate the accuracy with several classifiers on this single column, however, the result is also notgood, about 60%.


CHAPTER 7. FEATURES

Decision Tree Logistic SVM Linear 1-Nearest-NeighborSupine breath 59.22% 57.28% 54.37% 61.17% 54.37%Supine hold 59.22% 60.19% 66.02% 64.08% 66.99%

Sitting breath 56.31% 56.31% 57.28% 57.28% 53.40%Sitting hold 65.05% 58.25% 55.34% 58.25% 58.25%

Table 7.31: Accuracy of top-8 dimensions DFT-based features with several different classifiers.

Decision Tree Logistic SVM Linear 1-Nearest-NeighborSupine breath 57.28% 61.17% 59.22% 58.25% 62.14%Supine hold 58.25% 58.25% 66.02% 57.28% 66.99%

Sitting breath 52.43% 54.37% 56.31% 53.40% 50.49%Sitting hold 62.14% 55.34% 56.31% 60.19% 53.40%

Table 7.32: Accuracy of top-8 dimensions DWT-based features with several different classifiers.

1 2 3 4 5 6 7 8Supine breath 0.0346 0.7481 0.4304 0.3463 0.9431 0.5836 0.8438 0.988Supine hold 0.0287 0.3978 0.4012 0.4958 0.5404 0.6708 0.673 0.7286

Sitting breath 0.0034 0.4867 0.6653 0.403 0.5316 0.1591 0.7666 0.5281Sitting hold 0.1045 0.5244 0.2663 0.4533 0.5226 0.7065 0.8076 0.8417

Table 7.33: p-value of T-test of top-8 dimensions DFT-based features.

1 2 3 4 5 6 7 8Supine breath 0.0346 0.925 0.4475 0.9707 0.8062 0.0827 0.8915 0.6517Supine hold 0.0287 0.3219 0.6319 0.9764 0.1779 0.7964 0.3756 0.5297

Sitting breath 0.0034 0.3874 0.6335 0.4497 0.4989 0.7234 0.6884 0.1824Sitting hold 0.1045 0.9568 0.2525 0.6937 0.7774 0.966 0.5361 0.1543

Table 7.34: p-value of T-test of top-8 dimensions DWT-based features.

Conclusion

From the experimental results, we conclude that the DFT-based features and DWT-based featureshas poor discriminative power and cannot be valued features with respect to OSA. That is, thedata of OSA and non-OSA has no obvious discriminative characteristics in the frequency domainand time-frequency plane.


CHAPTER 7. FEATURES

7.2.10 DTW Distance between Different Dimensions

For each subject in the datasets, there are four dimensions of measurements, i.e., four time series,which are measured in four different postures. In this section, we intend to check whether theglobal difference between different dimensions after aligning them with each other could be valuedfeature to discriminate the OSA and non-OSA subjects. Since Dynamic Time Warping techniquecould be used to align two time series, thus we use the DTW distance between different dimensionsto be the evaluated feature.

Algorithm

There are two types of comparisons between different dimensions.

• Comparison between dimensions Supine and Sitting.In this case, there are two pairs of comparisons: [Supine Breath vs Sitting Breath] and[Supine Hold vs Sitting Hold ].

• Comparison between dimensions Breath and Hold.In this case, there are also two pairs of comparisons: [Supine Breath vs Supine Hold ] and[Sitting Breath vs Sitting Hold ].

Hence, there are 4 features to be evaluated in this section.


Table 7.35 shows the classification results of the feature DTW distance between different dimen-sions based on several different classifiers with 10-folds cross validation. We can see that most ofthe accuracy is 61.2%, in these cases all the subjects are classified as OSA subjects, which meansthe recall is 100% but the specificity is 0. Table 7.24 also shows the AUC and MCC for eachclassifier for each dimension. Most of the AUC in the result is just 0.500, and the MCC is 0.000,which indicate that the predictions with these classifiers are equivalent to the random prediction.

Table 7.36 shows the T-test results of DTW distance between different dimensions. We canfind all the h-values returned by T-test are 0 and p-value are high, which illustrates that there isno significant difference between OSA and non-OSA subjects with respect to these features.

Conclusion

Experimental results shows that the features that DTW distance between different dimensionsare not discriminative features, which indicates that there is little relationship between the DTWdistance of different dimensions and OSA disease.


Su B ↔ Si B Su H ↔ Si H Su B ↔ Su H Si B ↔ Si H

Linear

Accuracy 61.2% 61.2% 61.2% 61.2%AUC 0.500 0.500 0.500 0.500MCC 0.000 0.000 0.000 0.000

TP FP 63 40 63 40 63 40 63 40FN TN 0 0 0 0 0 0 0 0

SVM

Accuracy 61.2% 61.2% 61.2% 61.2%AUC 0.500 0.500 0.500 0.50MCC 0.000 0.000 0.000 0.000

TP FP 63 40 63 40 63 40 63 40FN TN 0 0 0 0 0 0 0 0

Logistic

Accuracy 61.2% 58.3% 59.2% 61.2%AUC 0.440 0.453 0.543 0.446MCC 0.000 -0.138 -0.057 0.000

TP FP 63 40 60 40 60 39 63 40FN TN 0 0 3 0 3 1 0 0

Decision Tree

Accuracy 61.2% 61.2% 61.2% 61.2%AUC 0.483 0.483 0.483 0.483MCC 0.000 0.000 0.000 0.000

TP FP 63 40 63 40 63 40 63 40FN TN 0 0 0 0 0 0 0 0

1-NN

Accuracy 51.5% 44.7% 57.3% 47.6%AUC 0.441 0.407 0.546 0.457MCC -0.051 0.193 0.117 -0.066

TP FP 41 28 37 31 39 20 32 23FN TN 22 12 26 9 24 20 31 17

Table 7.35: Classification results of DTW distance between different dimensions based on severaldifferent classifiers with 10-folds cross validation. Note that Su B ↔ Si B is the abbreviation ofSupine Breath vs Sitting Breath while Su H ↔ Si H means Supine Hold vs Sitting Hold.

T-test h value T-test p valueSupine Breath vs Sitting Breath 0 0.3392

Supine Hold vs Sitting Hold 0 0.9877Supine Breath vs Supine Hold 0 0.5747Sitting Breath vs Sitting Hold 0 0.8639

Table 7.36: T-test results of DTW distance between different dimensions.

CHAPTER 7. FEATURES

7.2.11 T-test Selection feature

In 6.3, we input all the points in a time series as the features to the classifiers, and the results arebad, no better than random guess. In this section, we first take the T-test on all the points, andselect the points that are significant different from non-OSA and OSA as the features. We callthis feature as T-test Selection feature.

Algorithm

There are two steps to obtain T-test Selection feature:

1. Take T-test on all points of time series.

2. Select the points with returned h-value == 1.


Table 7.37 shows the T-test result of all the points in time series. we set the value of α to be 0.01 toget more discriminative points. We can find that the selected points are all belong to oropharynxregion. The mean value of non-OSA are all larger than the mean value of OSA subjects, which isconsistent with the domain knowledge of OSA.

Table 7.38 shows the classification results of feature T-test Selection based on several differentclassifiers with 10-folds cross validation. we indicate the highest accuracy with blue color. We canfind that the result is much better than the result in Table 6.3, which takes all points in the timeseries as features.

h = 1 h=0 non-OSA mean OSA meanSupine breath {30− 35} {others points} 2.576 2.119Supine hold {32, 33} {other points} 2.081 1.576

Sitting breath {31− 34} {other points} 2.979 2.461Sitting hold {25, 26} {other points} 2.230 1.628

Table 7.37: h-value of T-test of feature T-test Selection with α = 0.01, totally 45 points in eachtime series.

Explanation

since we first select the points based on the T-test result, hence the selected points are morediscriminative than other points, and it can be considered as a refining process which removingnoise points in the time series. In addition, we can find that the selected points are all belong tothe oropharynx region, which is consistent with the domain knowledge.

Conclusion

From the evaluation result, we can find that T-test Selection is relatively a good feature. The T-test selects the most discriminative points as the features, which makes the accuracy much betterthan taking all points as features.


CHAPTER 7. FEATURES


Linear

Accuracy 60.2% 66.0% 61.2% 66.0%AUC 0.542 0.599 0.550 0.590MCC 0.099 0.240 0.119 0.236

TP FP 51 29 55 27 52 29 57 29FN TN 12 11 8 13 11 11 6 11

SVM

Accuracy 56.3% 68.9% 65.0% 66.0%AUC 0.474 0.637 0.582 0.581MCC -0.082 0.315 0.210 0.237

TP FP 55 37 55 24 56 29 59 31FN TN 8 3 8 16 7 11 4 9

Logistic

Accuracy 68.9% 65.0% 68.0% 64.0%AUC 0.668 0.602 0.674 0.597MCC 0.322 0.213 0.292 0.186

TP FP 52 21 55 28 59 24 55 29FN TN 11 19 8 12 9 16 8 11

Decision Tree

Accuracy 62.1% 71.8% 60.2% 62.1%AUC 0.521 0.576 0.522 0.499MCC 0.111 0.036 0.000 0.117

TP FP 58 5 55 21 58 36 57 33FN TN 34 6 8 19 5 4 6 7

Table 7.38: Classification results of feature T-test Selection based on several different classifierswith 10-folds cross validation.

7.3 Feature Combination

In this section, we combine the features (only single-value feature) we proposed before togetherand conduct a comprehensive analysis on them. Specifically, I first use the Correlation-basedFeature Selection (CFS) technique to select valuable features from all features we proposed, thisevaluation method can be used to check whether the valued features we considered before canbe selected by CFS. Then we use Information Gain Evaluation Ranking technique to rank thefeatures we get. Finally, we combine the valued features (4 features: Volume, Stretched Length,Volume Variance, T-test Selection feature) and make an evaluation on the combined feature.

7.3.1 Correlation-based Feature Selection (CFS)

[3] uses the wrapper method to come up with a better heuristic for finding the best feature subset.It evaluate the worth of a subset of features by considering the individual predictive ability of eachfeature along with the degree of redundancy between them.

Table 7.39 shows the selected features by CFS among all features with 10-fold cross validation.We can find that the feature Volume and Stretched Length can be always selected. The T-testSelection feature can be selected in dimension Supine breath and Sitting breath.

From the result, we can conclude that Volume and Stretched Length perform well and stable.

7.3.2 Information Gain Evaluation Ranking

We evaluate the worth of an feature by measuring the information gain with respect to the classand then rank all features.

Table 7.40 shows the top-4 features ranked by information gain. We can find that the top-4features are always Volume, Stretched Length, Volume Variance, T-test Selection feature, which isconsistent with our conclusion before.


CHAPTER 7. FEATURES

Selected features by CFSSupine breath Stretched Length (100%); Volume (30%); T-test Selection (40%)Supine hold Volume(100%);Stretched Length(20%)

Sitting breath Stretched Length (100%); Volume (40%); T-test Selection (40%)Sitting hold Volume (100%); Stretched Length(40%)

Table 7.39: Selected features by CFS among all features with 10-fold cross validation. The prob-ability shown after the feature name indicates the probability of being selected in 10-fold crossvalidation.

Top 4 features

Supine breath

1. Stretched Length2. Volume3. T-test Selection4. Volume Variance

Supine hold

1. Volume2. T-test Selection3. Stretched Length4. Volume Variance

Sitting breath

1. Stretched Length2. Volume3. T-test Selection4. Volume Variance

Sitting hold

1. Volume2. Stretched Length3. T-test Selection4. Volume Variance

Table 7.40: Top 4 features ranked by Information Gain Evaluation .

7.3.3 Feature Combination Performance

In this part, we combine the valued features (4 features: Volume, Stretched Length, VolumeVariance, T-test Selection feature) and make an evaluation on the combined feature with DecisionTree classifier.

Table 7.41 shows the evaluation result. we can see the accuracy does not improve comparedthe single feature Volume shown in Table 7.14 except the dimension Sitting hold. Hence, theperformance of combined feature is equivalent to Volume, which is the best one in four candidatefeatures.


CHAPTER 7. FEATURES


Random Forest

Accuracy 69.9% 68.0% 63.1% 76.7%AUC 0.714 0.659 0.645 0.738MCC 0.339 0.307 0.185 0.498

TP FP 55 23 50 20 50 25 56 17FN TN 8 17 13 20 13 15 7 23

Decision Tree

Accuracy 67.0% 72.8% 69.9% 75.7%AUC 0.621 0.608 0.576 0.605MCC 0.274 0.409 0.369 0.482

TP FP 52 23 55 20 62 30 59 21FN TN 11 17 8 20 1 10 3 19

Table 7.41: Accuracy with Decision Tree and Random Forest classifiers for combined features.

7.4 Conclusion

In this section, we propose a rich set of 16 features, which can be separated into two parts accordingto the feature characteristics: local features and global features.

In the part w.r.t.local features, we investigate the OPJ Value, OPJ Position, OPJ-alignedfeature, Maximum Value and OPJ Value/Maximum Value. Unfortunately, the performance ofthese features are not satisfying, predictions with them are no better than the random guess.

In the part about global features, we propose more than 10 features, some of them are basedon the existing technique, such as shapelets, SAX-based feature, DFT-based feature and so on;some of them are proposed by the domain knowledge or physiological meaning, such as Volume,Stretched Length, Reference-based feature and so on.

We conduct all-sided evaluation for each feature. Based on the evaluation result, we concludethat some of them perform better than others and can be considered as valued features. theyare Volume, Volume Variance, Stretched Length and T-test Selection feature. The accuracy ofthem can reach above 70%, which are much better than other features and the classification resultdirectly on the raw data. The highest accuracy we get is feature Volume, whose accuracy is 77%.

We combine the features (only single-value feature) we proposed together and conduct a com-prehensive analysis on them. The feature Volume and Stretched Length can be always selected byCFS technique in four dimensions while T-test Selection can be selected in some dimensions. thetop-4 features ranked by Information Gain technique are always the 4 features which we consid-ered valued. However, the performance of combined feature is equivalent to the performance ofVolume, but with no improvement.


Chapter 8

Conclusions

This project is about analyzing anatomical data obtained by pharyngometry with respect toobstructive sleep apnea (OSA) and the goal is to extract valued feature to discriminate OSA andnon-OSA subjects.

We have recorded data from 103 subjects that can be analyzed. The data for each subject iscross-sectional area in the upper airway over distance from teeth, hence it can be considered asthe time series datasets. Before attempting to extract features, we first conduct the preprocessingoperations to the datasets, including removing noise and normalization.

After preprocessing operations on the datasets, we first apply several popular time series clas-sification algorithms directly to the raw datasets, such as 1-NN algorithm with Euclidean distance,1-NN algorithm with DTW distance, and some classical Linear classification algorithms. In thesecases, the whole time series (45 steps of points) are considered as the input features for the clas-sifiers. Unfortunately, the performance of these algorithms are not good on the whole time series.

Due to poor performance of using whole time series as features, we need to extract discrimina-tive features, which is the goal of this project. In this project, we propose several features, whichcan be separated into two parts: local features and global features.

In the part w.r.t.local features, we investigate the OPJ Value, OPJ Position, Maximum Valueand OPJ Value/Maximum Value. Unfortunately, the performance of these features are not satis-fying, they are no better than the random guessing.

In the part about global features, we propose more than 10 features, some of them are basedon the existing technique, such as shapelets, SAX-based feature, DFT-based feature and so on;some of them are proposed by the domain knowledge or physiological meaning, such as Volume,Stretched Length, Reference-based feature and so on.

Based on several popular classifiers with 10-fold cross validation, we check the performance foreach feature, and through several performance criteria, such as T-test, MCC, ROC, AUC and soon, we conduct all-sided evaluation for each feature, and try to give a reasonable explanation forthe success or failure from the medical view and domain knowledge.

Based on the evaluation result, we conclude that some of them perform better than others andcan be considered as valued features. They are Volume, Volume Variance, Stretched Length andT-test Selection feature. The accuracy of them can reach above 70%, which are much better thanother features and the classification result directly on the raw data. The highest accuracy we getis feature Volume, whose accuracy is 77%.

We combine the features (only single-value feature) we proposed together and conduct ancomprehensive analysis on them. The feature Volume and Stretched Length can be always selectedby CFS technique in four dimensions while T-test Selection can be selected in some dimensions.The top-4 features ranked by Information Gain technique are always the 4 features which weconsidered valued. However, the performance of combined feature is equivalent to the performanceof Volume, with no improvement.

The good performance of Volume and Volume Variance can be explained from the domainknowledge easily. The most discriminative volume interval are almost near the OPJ point and


CHAPTER 8. CONCLUSIONS

are all belong to the oropharynx region, which is obstructed most possibly. For Volume Variance,since the muscle relax is one key factor that leads to OSA, hence, the flexibility of the musclein the upper airway of OSA subjects is lower than that of non-OSA subjects. Thus, the volumedifference between different postures w.r.t. OSA subjects is smaller than non-OSA. For T-testSelection feature, since we first select the points based on the T-test result, hence the selectedpoints are more discriminative than other points, and it can be considered as a refining processwhich removing noise points in the time series. In addition, we can find that the selected pointsare all belong to the oropharynx region, which is consistent with the domain knowledge. Based onthe evaluation result for Stretched Length feature, we conjecture that non-OSA subjects’ upperairway has more fluctuation than OSA subjects, is need to further verified from the medical view.

The highest accuracy we have got is 77%, which is actually not a very satisfying result. There isa key bottleneck: the size of the datasets are too small to have a sufficient training on the classifier.In addition, big size of datasets enable us to discover more discriminative features contained inthe datasets. If we can solve this bottleneck, I believe we could find more discriminative features.Anyway, though most of the features we propose in the project perform poor on our datasets, theycan be applied to other similar time series feature extraction problem.

In conclusion, there are two key contributions in this thesis work:

• A rich set of 16 features are extracted from raw time series data and evaluated by 10-foldcross validation based different classifiers with several different performance criteria. Someof proposed features perform well and have good discriminative power, such as Volume,Volume Variance, Stretched Length and T-test Selection feature.

• We propose a method to search for local part of the time series from which the extractedfeatures are most predictive.

We found local parts from which the volume and stretch length were extracted and improvedthe classification accuracy from 65% to 75%. An interesting finding is that the most predic-tive local parts found by our algorithm are very close to the oropharyngeal junction point(OPJ) and belong to the oropharynx region, which is consistent with the domain knowledgeof OSA.


Bibliography

[1] Jerome A. Dempsey, Sigrid C. Veasey, Barbara J. Morgan, , and Christopher P. O’Donnell.Pathophysiology of Sleep Apnea. ix, 4, 5

[2] Jung DG, Cho HY, Grunstein RR, and Yee B. Predictive value of Kushida index and acousticpharyngometry for the evaluation of upper airway in subjects with or without obstructivesleep apnea. Journal of Korean Medical Science, 19(5):662–7, 2004. iii, 2, 25

[3] Mark A. Hall and Lloyd A. Smith. Feature selection for machine learning: Comparing acorrelation-based filter approach to the wrapper, 1999. 57

[4] Richard J and MD. Schwab. Upper Airway Imaging. 1998. ix, 4

[5] Ibrahim Kamal. Normal standard curve for acoustic pharyngometry. Otolaryngology Headand Neck Surgery, 124(3):323–30, 2001. 2

[6] Ibrahim Kamal. Acoustic pharyngometry patterns of snoring and obstructive sleep apneapatients. Otolaryngol Head Neck Surg, 130(1):58–66, 2004. 2

[7] Ibrahim Kamal. Test-retest validity of acoustic pharyngometry measurements. OtolaryngolHead Neck Surg, 130(2):223–8, 2004. 2

[8] Cheryl Lathrop. Acoustic Pharyngometry Measurement of Minimal Cross-sectional AirwayArea Is Significant Predictor of Obstructive Sleep Apnoea. SLEEP. 2

[9] Jessica Lin, Eamonn Keogh, Li Wei, and Stefano Lonardi. Experiencing SAX: a Novel Sym-bolic Representation of Time Series. Data Mining and Knowledge Discovery, 15(2):107–144,2007. 35

[10] Fabian Morchen. Time series feature extraction for data mining using dwt and dft. Technicalreport, 2003. 52

[11] American Academy of Sleep Medicine. Obstrutive Sleep Apnea. iii, 3

[12] Patel SR, Frame JM, Larkin EK, and Redline S. Heritability of upper airway dimensionsderived using acoustic pharyngometry. Eur Respir J., 32(5):1304–8, 2008. 2

[13] Morgenthaler TI, Kagramanov V, and Decker PA. Hanak V. Complex sleep apnea syndrome:is it a unique clinical syndrome. Sleep, 29(9):1203–9. 3

[14] wikipedia. wiki. 3

[15] Yi-Leh Wu, Divyakant Agrawal, and Amr El Abbadi. A comparison of DFT and DWT basedsimilarity search in time-series databases. In CIKM, pages 488–495, 2000. 52

[16] Lexiang Ye and Eamonn Keogh. Time Series Shapelets: A New Primitive for Data Mining.SIGKDD, 2009. 34


Documents

Eindhoven University of Technology MASTER Extracting ...Extracting Features to Discriminate OSA and non-OSA Master Thesis Wenjie Pei Department of Mathematics and Computer Science