Classification methods for time series
thesisposted on 2017-02-06, 05:11 authored by Liu, Shen
The focus of this thesis is on the classification methods of time series, including clustering and discriminating techniques. The study in this thesis involves the examination of a number of existing approaches to time series classification, as well as the proposal of some new methodologies. An introduction to time series classification, the motivations, and an overview of the thesis are given in Chapter 1. In Chapter 2, two variable selection procedures are proposed in the context of feature-based time series clustering. First, five commonly used time series features are discussed and evaluated, and the evaluation is implemented by using two non-hierarchical methods (k-means and k-medoids algorithms) and four hierarchical methods (single linkage, complete linkage, average linkage and Ward’s technique). Then two variable selecting procedures are proposed, namely, the forward selection and backward elimination. The aim of these two procedures is to select those features that maximize the quality of the partitioning structure, which is measured by a specific criterion. Therefore, the intent is to select those variables that maximize the specified criterion. In Chapter 3, a new test of hypotheses for classifying stationary time series based on bias-adjusted estimators of fitted autoregressive model is proposed. In the literature, many of the existing classification methods may achieve poor performance when dealing with relatively short time series, and the aim of this proposal is to provide a reliable approach to classifying time series when the sample size is not sufficiently large. Simulation results show that when the time series is short, the size and power of the proposed test are reasonably good, and thus this test is reliable in discriminating between short-length time series from different data generating processes. In addition, the application results demonstrate that the proposed test achieves reasonably good performance in classifying relatively short series. In Chapter 4, an approach to time series classification is proposed based on the polarization of full forecast densities of the observed series. First, the bootstrap forecast replicates incorporating bias-correction and stationarity-correction are obtained for each time series, and based on these replicates a non-parametric kernel density estimation technique is implemented to approximate the forecast densities. Then the discrepancies of the forecast densities of pairs of series are estimated by a polarization measure, which captures the overlap between two distributions. Following the asymptotic distribution theory of the polarization measure, a hypothesis test is constructed to determine whether two forecast densities are significantly different. This approach to classifying univariate time series is extended in Chapter 5, where an innovative methodology for classifying multivariate time series is developed. Besides, some of the existing classification algorithms where multivariate time series are classified in either a supervised or an unsupervised manner are discussed. Conclusions of this thesis are drawn in Chapter 6. The main recommendations of this thesis are as follows: • In order to detect a more distinct clustering structure, it is worth considering variable selection in cluster analyses to eliminate the non-informative variables, especially in the context of feature-based time series clustering. As a result, the two variable selection procedures proposed in Chapter 2 are recommended. • Most of the existing approaches to time series classification cannot produce desirable results when dealing with small samples. If the lengths of time series are not sufficiently large, it is necessary to consider those methodologies that can achieve reasonably good small sample properties. In such cases, the test of hypotheses proposed in Chapter 3 is recommended, as it tends to achieve favourable performance even when the sample size is small. • If the research interest of classification analysis is on a specified future time period, one should seek for the methodologies that directly relate to the forecasts. In such cases, the polarization-based classification methods proposed in Chapters 4 and 5 are feasible, as the time series are classified according to the discrepancy of their forecast densities. It is worth noting that these methods also appear to produce reasonably good results when dealing with small samples, as the bias-correction has also been taken into consideration.