Models and algorithms for static data segment location in information networks
thesis
posted on 2017-02-22, 02:02authored bySen, Goutam
In a distributed database framework, data are distributed to geographically separated servers or data centers. This helps in distributing the load in the network and in serving the neighborhood locally. However, this solution leads to high operational costs. The location of data in multiple servers can be modeled as an optimization problem. We consider database segments as units of allocation since it is a promising and computationally tractable approach than considering files. It also eliminates the cost associated with mirroring the entire database and reduces the overhead of database updates. We build a novel data partitioning method using concepts from well-studied facility location models. Our model for the static data segment location in information networks (SDSLIN) builds on the uncapacitated single allocation p-hub median model from physical logistics. The servers that host database segments are treated as data hubs. They are connected to each other by high- bandwidth links (backbone). Routing large volumes of queries through these links helps in gaining cost benefits due to the economy of scale. We study two variants of SDSLIN on the basis of the backbone topology. The backbone could be a fully connected mesh (referred to as SDSLIN-mesh) or a tree (referred to as SDSLIN-tree). The problem SDSLIN-mesh jointly considers four subproblems, namely, segment allocation, server location, query routing and user assignment. In addition to these subproblems, the variant SDSLIN-tree considers the construction of the tree, which is itself an NP-hard problem. Mixed-integer linear programming (MILP) formulations are provided for both the variants. However, even a moderate sized problem (for example, 100 nodes and 5 segments) is out of the reach of traditional solvers such as CPLEX. This necessitates the development of other optimization methods that can be employed to solve real-world problems.
We identify some easy subproblems and develop a Benders decomposition (BD) algorithm. We employ several accelerations in the classical version of the algorithm for better performance. The approach is shown to be superior to CPLEX (under default settings) in terms of solving large instances. We also develop two metaheuristic approaches, namely, a simulated annealing (SA) approach and a particle swarm optimization (PSO) approach for our problem to provide approximate solutions for large instances in a reasonably small amount of computational time. We benchmark the performance of these approaches with the optimal solutions from CPLEX and the upper bounds from the BD. Our work assumes a static access pattern; hence, the models and formulations are useful only in the network design phase. Future research may consider online cases with changing access pattern. Thesis submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy of the Indian Institute of Technology Bombay, India and Monash University, Australia.
History
Campus location
Australia
Principal supervisor
David Abramson
Year of Award
2015
Department, School or Centre
Information Technology (Monash University Clayton)
Additional Institution or Organisation
Indian Institute of Technology Bombay, India (IITB)