posted on 2017-01-13, 03:22authored byShang, Han Lin
In the recent statistical literature, considerable attention has been paid to the development of functional data analysis. In particular, there have been many theoretical and practical developments in clustering and modeling of functional data. However, the development of visualizing and forecasting functional data is still very limited. The aim of this thesis is to develop new techniques for visualizing, modeling and forecasting functional data.
The first contribution of this thesis is to propose three graphical tools for visualizing the pattern of functional data in the form of smooth curves or surfaces. The proposed tools include functional versions of the bagplot and the highest density region (HDR) boxplot, which make use of the first two robust principal component scores, Tukey’s halfspace location depth and highest density regions. As a by-product, the functional bagplot and the functional HDR boxplot can also be used to detect functional outliers if they are present in the data. Their outlier detection performances are compared favorably with those of their competitors using two real data sets and a series of simulation studies.
The second contribution is to propose two nonparametric methods, namely weighted functional principal component regression and weighted functional partial least squares regression, for forecasting functional time series. These approaches allow smooth functions, and assign more weight to more recent data than to data from the distant past. They also provide a modeling scheme that can easily be adapted to take constraints and other information into account. Using the data sets of French female mortality rates and Australian fertility rates, I demonstrate that these two weighted methods perform similarly, but that they both have improved point forecast accuracy relative to those of their unweighted counterparts. Furthermore, I propose two new bootstrap methods for constructing prediction intervals, and evaluate and compare their empirical coverage probability.
The third contribution is to further examine the point forecast accuracy and interval forecast
accuracy of the weighted functional principal component regression for forecasting log mortality rates and life expectancy. Using the age- and sex-specific populations of 14 developed countries, I compare the short- to medium-term accuracy of this newly proposed method with those of nine well-established methods in the fields of demography and statistics. The weighted functional principal component regression achieves the best point forecast accuracy and interval forecast accuracy for log mortality rates. However, this does not necessarily translate into the best forecast accuracy for life expectancy. Therefore, I also examine which approach achieves the best point forecast accuracy and interval forecast accuracy for life expectancy.
Finally, I develop a nonparametric method for forecasting seasonal univariate time series. A univariate time series with N = np data points is divided into n functional time series with the function support range [x1; xp]. The forecasting method reduces the data dimensionality by functional principal component analysis, and then applies univariate time series forecasting and functional principal component regression techniques. When partial data in the most recent curve are observed, four dynamic updating methods are introduced, namely the block moving method, the ordinary least squares method, the ridge regression method, and the penalized least squares method. Using a data set of monthly sea surface temperatures between 1950 and 2008, I compare the dynamic updating methods with several benchmark methods, and show their superior point forecast accuracy and interval forecast accuracy. Furthermore, a nonparametric approach is introduced to construct prediction intervals for an entire forecast curve or part thereof. Awards: Winner of the Mollie Holman Doctoral Medal for Excellence, Faculty of Business and Economics, 2010.