monash_89364.pdf (1.57 MB)
Download file

Statistical clustering of U.S. stock data via the generalised style classification algorithm

Download (1.57 MB)
posted on 31.01.2017, 04:13 by Wong, Woon Weng
This study explores the creation of homogeneous groups of stock based on returns. Currently no such classification scheme exists and industry classification schemes are used instead. These schemes do not make groupings based on return and so there is a fundamental mismatch between the way these groupings are made and their ultimate use in the literature. Such homogeneous returns groupings can be used to create a returns based classification scheme, which have the potential to improve various applications such as the identification of control firms for benchmarking purposes; and can lead to improved industry cost of capital estimates. To create these homogenous return groups, an innovative statistical clustering method known as the Generalised Style Classification (GSC) algorithm and an objective method for determining the optimal number of clusters known as the Gap statistic test is used. The results indicate that the GSC can successfully create a returns based industry classification scheme; and that these GSC industry clusters are superior to current industry classification schemes at explaining the cross section of stock returns both in and out of sample. Further tests indicate that the GSC is superior at partitioning risky assets into separate risk classes while minimising returns variation within each risk class, which are the conditions necessary for improving industry cost of capital estimates. Ideologically, this research has wider implications for the theory of asset pricing. The current dominant paradigm suggests that returns can be explained by exposure to generic risk factors however such studies rely on arbitrary partitioning of the data and this practice may lead to a number of econometric issues including truncation and selection bias, loss in power of statistical tests and data snooping bias. Contrary and less widely accepted studies have suggested that returns can be explained by industry factors. This study finds evidence of the latter. This indicates that the impact of industry effects on the returns generating process must be reconsidered. Methodology wise, the approach used to arrive at the results in this thesis, does not rely on data partitioning thereby making it immune to the aforementioned econometric issues. The GSC represents an exciting addition to the finance literature. This research demonstrates how it may be applied to stock returns with great success. A vast number of applications in the finance literature will benefit from the homogeneous returns groupings created via the GSC and researchers wishing to adopt a data driven, objective means of creating such groupings must consider the use of the GSC.


Campus location


Principal supervisor

Paul Lajbcygier

Additional supervisor 1

Lee Gordon-Brown

Year of Award


Department, School or Centre



Doctor of Philosophy

Degree Type



Faculty of Business and Economics