Distributed Associative Memory Approach for Cloud Computing Environments
thesis
posted on 2017-04-09, 23:25authored byAmir Hossein Basirat
With emerging
interest to leverage massive amounts of data that are available in open
sources, such as the Web for solving long-standing information retrieval
problems, the question as how to effectively process immense datasets is
becoming increasingly relevant. This raises the question of whether our
capability to recognise and process such immense data copes with our ability to
generate them. This question will be addressed in this thesis by first
examining the capability of existing large-scale data-processing schemes to
scale up with this outgrowth of data. To address some of their highlighted
limitations, particularly regarding computational complexity and scalability,
this research proposes a novel associative-memory-based scheme for big data
processing that is scalable, distributable and lightweight, and that overcomes
some of the issues encountered in traditional data access mechanisms for data
storage and retrieval. To achieve the above goal, a distributed data access
scheme that enables data storage and retrieval by association is first
developed to circumvent the partitioning issue experienced within referential
data access mechanisms. In our model, data records are treated as patterns. As
a result, data storage and retrieval are performed using a distributed pattern
recognition approach that is implemented through the integration of loosely
coupled computational networks, followed by a divide-and-distribute approach
that facilitates the distribution of these networks within the cloud
dynamically.
To date, all implementations of MapReduce, including the Hadoop
version, have interpreted data in a relational model, which limits its
functionality when dealing with complex and unstructured data such as images.
To address this, an associative-memory-based MapReduce is introduced to elevate
the MapReduce key-value scheme to a higher level of functionality by replacing
the purely quantitative key-value pairs with scalable associative-memory-based
data structures that will improve parallel processing of data with complex
relations. By having an associative key-value model, we can deal with data in
any form and in any representation simply by using a pattern-matching model
that treats data records as patterns and provides a distributed data access
scheme that enables data storage and retrieval by association, thereby
circumventing the scaling issue experienced within referential data access
mechanisms. The principle of associative-memory-based learning is implemented
through the use of connected layers in a hierarchical fashion; with local
feature learning happening at the lowest layer while features are combined to
form higher representations at upper layers.
In addition, this thesis
investigates the extension of the proposed distributed data management scheme
for different data-intensive scenarios by improving upon the existing cloud
data management models for fault tolerance and scalability and reducing
MapReduce communication overheads by introducing data locality. In particular,
three data-intensive scenarios are considered in detail: dealing with large
datasets, handling large training volumes and a neural network with an
excessive number of processing neurons. Moreover, the application of our
associative-memory-based approach is examined as a case study in a cloud of
wireless sensor networks (Cloud-WSNs) to investigate the capabilities of the
scheme in performing large-scale pattern recognition operations in
resource-constrained WSNs.
History
Campus location
Australia
Principal supervisor
Asad I. Khan
Additional supervisor 1
Balasubramaniam Srinivasan
Year of Award
2017
Department, School or Centre
Information Technology (Monash University Clayton)