In data mining, the task-specific performances of conventional distance-based similarity measures vary significantly in different data distributions because they are data-independent and sensitive to units or scales of measurement. This thesis investigates a measure, where the similarity of two instances is determined by the distribution of data. It introduces a new (dis)similarity measure, which is data-dependent and robust to units and scales of measurement. The empirical evaluation conducted across a wide range of datasets shows that the new measure produces better or at least more consistent task-specific performance than widely-used distance-based measures, particularly in high-dimensional datasets.
History
Campus location
Australia
Principal supervisor
Kai Ming Ting
Additional supervisor 1
Gholamreza Haffari
Additional supervisor 2
Takashi Washio
Year of Award
2017
Department, School or Centre
Information Technology (Monash University Clayton)