Statistical issues in modelling and forecasting sequential count data
thesisposted on 08.02.2017 by Kostenko, Andrey
In order to distinguish essays and pre-prints from academic theses, we have a separate category. These are often much longer text based documents than a paper.
The thesis is concerned with statistical issues involved in modelling and forecasting time series of infrequent demands that occur randomly in time and for a random integer multiple of some unit, such as for a spare part. Its core consists of six main chapters, which form three essentially independent but closely related parts: Croston's paper and recent extensions; model-based forecasting and exponential smoothing; and binary time series and Markov chains. Part I (Chapters 2 and 3) deepens the understanding and implications of the early research on sporadic demand forecasting, explores the weaknesses of proposals claimed to be the state of the art of parametric forecasting in this area, and develops new insights into modelling and forecasting counts of demands that occur randomly in time. Chapter 2 establishes new insights, surprising facts and original viewpoints in connection with the frequently cited paper by John Croston. A finite-sample version of Croston's two-part smoothing procedure is proposed. It also examines methods for forecasting sporadic demands that predate Croston's paper, particularly those of Robert Goodell Brown. Chapter 3 revisits the present author's published commentary on a recent proposal of a categorisation scheme for forecasting. It is shown that this proposal depends critically on the assumption imposed on the use of alternative estimation procedures, and that relaxing this assumption may lead to considerable changes in the threshold values constituting the proposed categorisation scheme. It is suggested that more traditional approaches to selecting forecasting procedures in the light of data available should be favoured. Part II (Chapters 4 and 5) contains new and perhaps surprising insights into the published theory of model-based forecasting with exponential smoothing. In Chapter 4, it is argued that the contemporary literature on model-based forecasting with exponential smoothing fails to provide adequate credit to early work in this area by statisticians of renown. It is shown that some of the recently published results are not entirely original. The original results are found in a largely overlooked paper published by Jeff Harrison in 1967. Some new generalisations of Harrison's forecast error variance formulae are derived. The significant\cant contribution in Chapter 5 is the analysis of limiting properties of additive and multiplicative exponential smoothing models for count and other non-Gaussian data. It is found that contributions of some authors to the forecasting and statistical literature on this topic appear to deviate substantially from established results in probability theory. Building on this critique, a new explicit criterion is derived for the random process defined as the cumulative product of independent copies of a gamma random variable to converge to zero almost surely. In addition, an approach to forecasting sequential count data based on applying exponential smoothing to the probabilities of each count outcome, rather than to the outcomes themselves, is introduced. Part III (Chapters 6 and 7) explores applied and theoretical issues in connection with two-state simple Markov chains, or Markov trials for short. The substantial contribution of Chapter 6 lies in conducting a large-scale empirical analysis of statistical properties of 0-1 sequences of demand occurrences and establishing that in the majority of cases the usual methods of time series forecasting are superfluous. A new approach to modelling intermittence of future demands over a lead time, consistent with the observed statistical properties of the majority of data considered, is suggested. The novel and significant contribution of Chapter 7 lies in deriving the exact and approximate unconditional maximum likelihood (ML) estimators algebraically, in terms of alternative sets of sufficient statistics. It turns out that this problem in statistical inference for Markov trials appears to have eluded researchers in this area until now.