Adaptive Organization of Digital Documents using Knowledge Graphs
thesisposted on 24.05.2018 by Ramakrishna Bairi
In order to distinguish essays and pre-prints from academic theses, we have a separate category. These are often much longer text based documents than a paper.
This thesis studies the problem of automatically evolving a hierarchy of categories to organize the documents in a collection, considering user preferences (e.g., categories biased to a particular field). It makes use of a massive knowledge graph to guide the machine learning models to evolve the category structure and organizes the documents accordingly. The categorization also adapts to the growing document collection. It also presents a novel technique for categorizing “short texts” having very few words. This work has applications in machine learning tasks such as automatic creation of “Wikipedia Disambiguation” like pages, automatic generation of Table of Contents, drill-down search, etc.