Discretization for naive-Bayes Learning: Managing Discretization Bias and Variance

Yang, Y; Webb, G I

doi:10.26180/20707279.v1

tr-2003-131-full.pdf (299.46 kB)

Discretization for naive-Bayes Learning: Managing Discretization Bias and Variance

report

posted on 2022-08-29, 04:57 authored by Y Yang, G I Webb

Quantitative attributes are usually discretized in naive-Bayes learning. We prove a theorem that explains why discretization can be effective for naive-Bayes learning. The use of different discretization techniques can be expected to affect the classification bias and variance of generated naive-Bayes classifiers, effects we name discretization bias and variance. We argue that by properly managing discretization bias and variance, we can effectively reduce naive-Bayes classification error. In particular, we propose proportional k-interval discretization and equal size discretization, two efficient heuristic discretization methods that are able to effectively manage discretization bias and variance by tuning discretized interval size and interval number. We empirically evaluate our new techniques against five key discretization methods for naive-Bayes classifiers. The experimental results support our theoretical arguments by showing that naive-Bayes classifiers trained on data discretized by our new methods are able to achieve lower classification error than those trained on data discretized by alternative discretization methods.

History

Technical report number

2003/131

Year of publication

2003

Usage metrics

Keywords

Discretization Naive-Bayes Learning Bias Variance

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Discretization for naive-Bayes Learning: Managing Discretization Bias and Variance

History

Technical report number

Year of publication

Usage metrics

Categories

Keywords

Licence

Exports