Search | Korea Science

Yoo, Jong-Yeol;Hyun, Sang-Hyun;Yang, Dong-Min
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2015.10a
- /
- pp.242-245
- /
- 2015
Recently due to large-scale data spread in digital economy, the era of big data is coming. Through big data, unstructured text data consisting of technical text document, confidential document, false information documents are experiencing serious problems in the runoff. To prevent this, the need of art to sort and process the document consisting of unstructured text data has increased. In this paper, we propose a novel text classification scheme which learns some data sets and correctly classifies unstructured text data into two different categories, True and False. For the performance evaluation, we implement our proposed scheme using $Na{\ddot{i}}ve$ Bayes document classifier and TF-IDF modules in Python library, and compare it with the existing document classifier.
PDF

Kim, Han-Joon;Chang, Jae-Young
- The KIPS Transactions:PartB
- /
- v.15B no.5
- /
- pp.457-464
- /
- 2008
In the real-world operational environment, most of text classification systems have the problems of insufficient training documents and no prior knowledge of feature space. In this regard, $Na{\ddot{i}ve$ Bayes is known to be an appropriate algorithm of operational text classification since the classification model can be evolved easily by incrementally updating its pre-learned classification model and feature space. This paper proposes the improving technique of $Na{\ddot{i}ve$ Bayes classifier through feature weighting strategy. The basic idea is that parameter estimation of $Na{\ddot{i}ve$ Bayes considers the degree of feature importance as well as feature distribution. We can develop a more accurate classification model by incorporating feature weights into Naive Bayes learning algorithm, not performing a learning process with a reduced feature set. In addition, we have extended a conventional feature update algorithm for incremental feature weighting in a dynamic operational environment. To evaluate the proposed method, we perform the experiments using the various document collections, and show that the traditional $Na{\ddot{i}ve$ Bayes classifier can be significantly improved by the proposed technique.
https://doi.org/10.3745/KIPSTB.2008.15-B.5.457 인용 PDF KSCI