Version 2 2017-03-02, 22:27Version 2 2017-03-02, 22:27
Version 1 2017-03-02, 05:01Version 1 2017-03-02, 05:01
thesis
posted on 2017-03-02, 22:27authored byJoarder Mohammad Mustafa Kamal
More than 3.26
billion Internet users are using Web and mobile applications every day for
retail services and businesses processing; information management and
retrievals; media and social interactions; gaming and entertainments, etc. With
the rapid growth in simultaneous users and data volume, such applications often
rely on distributed databases to handle large magnitudes of On-Line Transaction
Processing (OLTP) requests at the Internet scale. These databases employ
user-defined partitioning schemes during their initial application design phase
to scale-out concurrent writing of user data into a number of shared-nothing
servers. However, processing Distributed Transactions (DTs), that involve data
tuples from multiple servers, can severely degrade the performance of these
databases. In addition, transactional patterns are often influenced by the
frequent data tuples in the database, resulting in significant changes in the
workload that none of the static partitioning schemes can handle in real-time.
This thesis addresses the abovementioned challenge through workload-aware
incremental data repartitioning. The primary goal is to minimally redistribute
data tuples within the system with a view to minimising the adverse impact of
DT without adversely affecting the data distribution load balance. Furthermore,
to support both range and consistent-hash based data partitions used in OLTP
systems, a distributed yet scalable data lookup process, inspired by the
roaming protocol in mobile networks, is introduced. Moreover, additional
challenge lies on how to reduce the computational complexities while processing
large volume of transactional workload very quickly to make real-time
decisions. These complexities are addressed in three distinct ways — hide the
workload network complexities during analysis by using different level of
abstractions; find the appropriate size of observation window to decide how far
to look back in the past for workload analysis; and finally, incrementally
(i.e., how frequently) trigger the repartitioning process based on a threshold.
Some well-defined Key Performance Indicator (KPI) metrics for database
repartitioning are defined and rigorous sensitivity analyses are performed on
different types of transaction generation models and observation window sizes.
In the first stage, graph-theoretic higher level abstractions such as graphs,
hypergraphs, and their compressed forms are used to deliver sub-optimal
repartitioning decisions using graph min-cut techniques. In order to deliver
optimal decisions, a greedy approach is further developed, which ranks the
transactions in terms of the measure of repartitioning objectivity and performs
transaction-level optimisation to maximise a particular KPI. Finally,
transactional stream mining is used to construct a representative sub-network
of frequent transactions and perform transaction-level adaptive repartitioning,
which eventually reduces the computational complexity, while concomitantly
improving the performance due to filtering out a significant volume of
infrequent transactions, that have little bearing on the KPIs. Simulation-based
experimental evaluations demonstrate that the proposed workload-aware
incremental repartitioning schemes outperform the existing static and dynamic
repartitioning approaches to be considered effective for production deployment.
History
Campus location
Australia
Principal supervisor
Manzur Murshed
Additional supervisor 1
Joarder Kamruzzaman
Additional supervisor 2
Rajkumar Buyya
Additional supervisor 3
Mohamed Gaber
Year of Award
2017
Department, School or Centre
Information Technology (Monash University Gippsland)