Some statistical aspects of singular spectrum analysis
thesisposted on 01.03.2017 by Khan, Md. Atikur Rahman
In order to distinguish essays and pre-prints from academic theses, we have a separate category. These are often much longer text based documents than a paper.
Singular spectrum analysis (SSA) is a nonparametric technique that has gained popularity to decompose the observed series into the sum of orthogonal and interpretable components. SSA is akin to the classical decomposition of a time series into the sum of trend, cyclical, seasonal and noise components. Reconstruction of signal is a critical initial step in SSA that underlies any application, such as forecasting, or the analysis of missing data or change point detection problems. Two basic parameters: the window length of the embedding, and the dimension of the signal that must be assigned by the practitioner, are very important for optimal reconstruction of signal. A set of statistical tests and an information theoretic criterion for optimal reconstruction of signal have been proposed in this thesis. The standard approach of selecting a very large window length is to ensure the orthogonality of the components by comparing the image plot of the weighted correlation matrix for different window lengths. Apart from such pattern evaluation and the hurdle of finding a window length that provides a clear view of the image plot, we propose a new methodology for selecting the window length in SSA in which the window length is determined from the data prior to the commencement of modeling. This selection procedure is based on statistical tests designed to test the convergence of the autocovariance function for both short- and long-memory processes. Asymptotic properties of these test statistics are found to be consistent with simulation results. Furthermore, application to Southern Oscillation Index data shows how this approach can enhance the reconstruction and predictive performance of SSA. Information theoretic analysis of the signal-noise separation problem in SSA is also provided in this thesis. A minimum description length criterion is proposed based on the signal-plus-noise model obtained through the Karhunen-Loeve expansion of the trajectory matrix. Under very general regularity conditions the criterion is found to identify the true signal dimension with probability of one as the sample size increases. Furthermore, empirical results from simulation experiments and real data analysis indicate that even in the case of relatively small samples the asymptotic theory is reflected in observed behavior. Assessment of the quality of separation and reconstruction of signal is carried out by introducing two measures: mean squared separation error (MSSE) and mean squared reconstruction error (MSRE). Algebraic and asymptotic bounds for both MSSE and MSRE are then used to assess the quality of signal extracted by employing an SSA. While the former is implementable only when the true signal is known, the latter is implementable for any observed process and this behavior is reflected in both simulation results and real data analysis. Mean squared forecast error (MSFE) is a measure of checking forecast accuracy of a time series model, and theoretical results of MSFE based on the linear recurrence relation are established through the eigen-decomposition of the trajectory matrix. Two extreme classes of processes, AR(1) and RW processes, are considered in this thesis to assess the window length effect on MSFE. While the objectively defined window length selection by evaluating MSRE is deemed favorable for an AR(1) process, the smallest possible window length supports the RW forecasting of a series. Theoretical results are also reflected in simulation experiments and real data analysis.