An autonomous incremental learning model for efficient mining of text data

Matharage, Sumith Shantha

doi:10.4225/03/58a27186615ce

Restricted Access

Reason: Access restricted by the author. A copy can be requested for private research and study by contacting your institution's library service. This copy cannot be republished

An autonomous incremental learning model for efficient mining of text data

thesis

posted on 2017-02-14, 02:55 authored by Matharage, Sumith Shantha

Proliferation of the World Wide Web has massively increased the availability of textual data in recent years, presenting a challenge for researchers to maximise the usage of this data with minimum human intervention. The field of text mining research has emerged as a solution to this, focusing on the development of new techniques to discover useful knowledge from these large volumes of text data. The main research challenges in the text mining field are; (a) unstructured nature of the text (b) capturing semantics information (c) coping with a large number of words and the structure of the natural language. There have been many different techniques proposed in the text mining literature trying to address the above mentioned challenges individually or as a combination. The Self Organizing Feature Map (SOM) algorithm is one of the most successful and widely used techniques among all these and has been extended for diverse text mining tasks. The primary aim of this thesis is to provide a more efficient autonomous incremental text clustering model. Also, improving the semantic aspects of the text clustering process is examined. A Fast Scalable Growing Self Organizing Map (FSGSOM) algorithm is proposed to provide a more efficient autonomous clustering of text based on the dynamic topology preservation capabilities of the Growing Self Organizing Map (GSOM) algorithm. To enrich the semantic capabilities, a dynamic variable length sequence based feature selection model is integrated into the feature selection phase. As an additional method of incorporating semantics, Wikipedia is used as a background information source in result interpretation. As most of the text information available is not stationary, an incremental learning model based on the FSGSOM clustering is proposed to handle non-stationary text information. The proposed model consists of a semi-continuous text processing model together with an evolving hierarchy of concepts to generalise and preserve the learning outcomes for future training. A template based document selection mechanism is utilised to form lateral connection across the different phases of learning. In summary, this thesis proposes a more efficient incremental text clustering and knowledge preservation model contributing to the field of text mining research.

History

Campus location

Australia

Principal supervisor

Damminda Alahakoon

Year of Award

2012

Department, School or Centre

Information Technology (Monash University Clayton)

Course

Doctor of Philosophy

Degree Type

DOCTORATE

Faculty

Faculty of Information Technology

Usage metrics

Keywords

Growing self organising map monash:110810 Efficient clustering ethesis-20130228-082020 Text clustering 1959.1/795252 thesis(doctorate)Self organising map Restricted access 2012

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Restricted Access

An autonomous incremental learning model for efficient mining of text data

History

Campus location

Principal supervisor

Year of Award

Department, School or Centre

Course

Degree Type

Faculty

Usage metrics

Categories

Keywords

Licence

Exports