File(s) under permanent embargo

Reason: Restricted by author. A copy can be supplied under Section 51 (2) of the Australian Copyright Act 1968 by submitting a document delivery request through your library, or by emailing

Stock Return Prediction with Hidden Order Mapping

posted on 20.12.2016, 00:58 by Varsha Mamidi
Missing data problem is ubiquitous in many real life situations. Information Technology researchers have explored and tried to address this problem in different settings. In this thesis, we undertake research to address missing data problem associated with order book information in stock markets. This is an in-depth and large-scale study with systematic and comprehensive framework to address missing data problem in the finance literature.
   Orders placed by traders and the corresponding order imbalance (OIB) is informative to predict future stock returns, however, stock exchange rules do not reveal price sensitive complete order book data for traders. Hence, return prediction using the revealed, incomplete trade book data (that contains only matched buy and sell orders and deletes the unmatched orders), does not let traders to completely exploit possible short term trading opportunities. Hence, this can be considered as a classical missing data problem for predicting future returns, by using the information content of order book. This thesis addresses the missing data problem by developing an integrated theoretical framework applied in stock market trading environment. We use relational Markov networks theory and build an empirically testable Algorithm for Imputed Complete Order Book (AICOB).
   The thesis contributes by developing a new theoretical advancement of information technology research relating to missing data problem and applying it to financial markets. First, the thesis presents the missing data problem as a Missing at Random (MAR) data and builds a systematic framework to estimate single as well as joint log likelihood functions. The thesis demonstrates that estimating by using incomplete records, improves the accuracy of the parameter estimates.
   Second, the thesis proposes a Relational Markov Network Model for estimation of the joint distribution function of orders, order characteristics and their interactions. Later, the Expectation Maximization Algorithm is proposed to address the missing data problem during the joint estimation procedure. All pooled regression results follow Fama and MacBeth (1973) and Generalized Methods of Moments (GMM) methods. These methods control the cross sectional and time series correlations between the observations and across the pooled stocks. The proposed novel methodology overcomes the estimation problem in the context of missing order book data.
   Third, the thesis develops an objective evaluation strategy for AICOB, based on efficiency, accuracy and adaptability dimensions. The thesis uses Australian stock market data, which provides not only trade book data but also historical order book data for cross validating the results. This unique setting allows validating the accuracy of AICOB methodology by comparing with complete order book data. The main contribution of the thesis is to show that AICOB based predictions match with the complete order book data. Whereas, trade book based predictions are quite inconsistent to the complete order book data. The AICOB based results are also consistent with the theoretical predictions proposed in finance literature that OIB predicts better as the firm size increases. The results show that large firms, with higher trading activity and more competition for order flows, report more significant OIB prediction of future returns. Trade Book based OIB estimates, which suffer from missing data problem, fail to predict future returns for stock portfolios. Hence, addressing the missing data problem is important before implementing OIB based trading strategies.
   Overall, the thesis finds that machine learning applications, similar to AICOB, can be helpful in implementing the trading strategies in financial markets. The imputation of missing data, through systematic procedures, based on theoretical distributional properties of the financial variables, can be informative for more accurate prediction of future returns. The thesis contributes towards establishing a common ground for cross-disciplinary research in Finance and Information Technology (IT) by applying the advances in IT research to solve research and corresponding implementation problems in finance. Further, the thesis contributes towards advancing the order imbalance literature by showing that missing data can play a critical role in the predictive ability of order imbalance.


Campus location


Principal supervisor

Bala Srinivasan

Additional supervisor 1

Huu Duong

Additional supervisor 2

Madhu Veeraraghavan

Year of Award


Department, School or Centre

Caulfield School of Information Technology


Faculty of Information Technology

Usage metrics