Anomaly detection based on zero appearances in subspaces

Pang, Guansong

doi:10.4225/03/58b647d9a377b

4705471_monash_156144.pdf (839.11 kB)

Anomaly detection based on zero appearances in subspaces

thesis

posted on 2017-03-01, 04:02 authored by Pang, Guansong

Anomaly detection is regarded as one of the most important tasks in data mining due to its wide application in various domains, such as finance, information security, healthcare and earth science. With advancements in data collection techniques, the volume and dimensionality of anomaly detection data sets increase explosively, and diverse attribute types occur within these data sets. Also, in many data sets, anomalies can be detected in some attributes only, while other attributes are irrelevant to anomaly detection. All these characteristics pose new challenges to existing anomaly detection techniques. Motivated by this fact, this research aims to design an anomaly detection method which can scale up to large and high dimensional data, is able to identify anomalies in data sets with different types of attributes, and tolerates irrelevant attributes. This thesis posits that anomalies are instances with low probabilities in subspaces in a data set. So, in a random subset of the data set, anomalies have higher probabilities of having zero appearances in the subspaces than normal instances. Based on this property, this thesis proposes a novel anomaly detection method called ZERO++ which employs the number of zero appearances in subspaces to detect anomalies. ZERO++ is the only anomaly detector based on zero appearances in subspaces, as far as we know. It is unique in that it works in regions of subspaces that are not occupied by data; whereas other methods work in regions occupied by data. Utilising the anti-monotone property: `if an instance has zero appearances in a subspace, it must also have zero appearances in subspaces containing this subspace', we show that only a small number of subspaces with low dimensionality needs to be considered to identify anomalies effectively. ZERO++ is an efficient algorithm with linear time complexity with respect to data size and data dimensionality, and it can work effectively in data sets with different types of attributes, and a low percentage of relevant attributes.

History

Campus location

Australia

Principal supervisor

Kai Ming Ting

Additional supervisor 1

David Albrecht

Year of Award

2015

Department, School or Centre

Information Technology (Monash University Clayton)

Degree Type

MASTERS

Faculty

Faculty of Information Technology

Usage metrics

Keywords

Ensemble learning Zero appearances monash:156144 Anomaly detection Categorical data ethesis-20150520-093556 thesis(masters)Open access 1959.1/1178145 2015 Mixed data

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Anomaly detection based on zero appearances in subspaces

History

Campus location

Principal supervisor

Additional supervisor 1

Year of Award

Department, School or Centre

Degree Type

Faculty

Usage metrics

Categories

Keywords

Licence

Exports