Search | Korea Science

Improving of kNN-based Korean text classifier by using heuristic information (경험적 정보를 이용한 kNN 기반 한국어 문서 분류기의 개선)

Lim, Heui-Seok;Nam, Kichun
- The Journal of Korean Association of Computer Education
- /
- v.5 no.3
- /
- pp.37-44
- /
- 2002
Automatic text classification is a task of assigning predefined categories to free text documents. Its importance is increased to organize and manage a huge amount of text data. There have been some researches on automatic text classification based on machine learning techniques. While most of them was focused on proposal of a new machine learning methods and cross evaluation between other systems, a through evaluation or optimization of a method has been rarely been done. In this paper, we propose an improving method of kNN-based Korean text classification system using heuristic informations about decision function, the number of nearest neighbor, and feature selection method. Experimental results showed that the system with similarity-weighted decision function, global method in considering neighbors, and DF/ICF feature selection was more accurate than simple kNN-based classifier. Also, we found out that the performance of the local method with well chosen k value was as high as that of the global method with much computational costs.
PDF

Automatic Document Classification Based on k-NN Classifier and Object-Based Thesaurus (k-NN 분류 알고리즘과 객체 기반 시소러스를 이용한 자동 문서 분류)

Bang Sun-Iee;Yang Jae-Dong;Yang Hyung-Jeong
- Journal of KIISE:Software and Applications
- /
- v.31 no.9
- /
- pp.1204-1217
- /
- 2004
Numerous statistical and machine learning techniques have been studied for automatic text classification. However, because they train the classifiers using only feature vectors of documents, ambiguity between two possible categories significantly degrades precision of classification. To remedy the drawback, we propose a new method which incorporates relationship information of categories into extant classifiers. In this paper, we first perform the document classification using the k-NN classifier which is generally known for relatively good performance in spite of its simplicity. We employ the relationship information from an object-based thesaurus to reduce the ambiguity. By referencing various relationships in the thesaurus corresponding to the structured categories, the precision of k-NN classification is drastically improved, removing the ambiguity. Experiment result shows that this method achieves the precision up to 13.86% over the k-NN classification, preserving its recall.
PDF KSCI

A New Memory-Based Reasoning Algorithm using the Recursive Partition Averaging (재귀 분할 평균 법을 이용한 새로운 메모리기반 추론 알고리즘)

Lee, Hyeong-Il;Jeong, Tae-Seon;Yun, Chung-Hwa;Gang, Gyeong-Sik
- The Transactions of the Korea Information Processing Society
- /
- v.6 no.7
- /
- pp.1849-1857
- /
- 1999
We proposed the RPA (Recursive Partition Averaging) method in order to improve the storage requirement and classification rate of the Memory Based Reasoning. This algorithm recursively partitions the pattern space until each hyperrectangle contains only those patterns of the same class, then it computes the average values of patterns in each hyperrectangle to extract a representative. Also we have used the mutual information between the features and classes as weights for features to improve the classification performance. The proposed algorithm used 30~90% of memory space that is needed in the k-NN (k-Nearest Neighbors) classifier, and showed a comparable classification performance to the k-NN. Also, by reducing the number of stored patterns, it showed an excellent result in terms of classification time when we compare it to the k-NN.
PDF

OHC Algorithm for RPA Memory Based Reasoning (RPA분류기의 성능 향상을 위한 OHC알고리즘)

이형일
- Journal of Korea Multimedia Society
- /
- v.6 no.5
- /
- pp.824-830
- /
- 2003
RPA (Recursive Partition Averaging) method was proposed in order to improve the storage requirement and classification rate of the Memory Based Reasoning. That algorithm worked well in many areas, however, the major drawbacks of RPA are it's pattern averaging mechanism. We propose an adaptive OHC algorithm which uses the FPD(Feature-based Population Densimeter) to increase the classification rate of RPA. The proposed algorithm required only approximately 40% of memory space that is needed in k-NN classifier, and showed a superior classification performance to the RPA. Also, by reducing the number of stored patterns, it showed a excellent results in terms of classification when we compare it to the k-NN.
PDF

Design of Gas Classifier Based On Artificial Neural Network (인공신경망 기반 가스 분류기의 설계)

Jeong, Woojae;Kim, Minwoo;Cho, Jaechan;Jung, Yunho
- Journal of IKEEE
- /
- v.22 no.3
- /
- pp.700-705
- /
- 2018
In this paper, we propose the gas classifier based on restricted column energy neural network (RCE-NN) and present its hardware implementation results for real-time learning and classification. Since RCE-NN has a flexible network architecture with real-time learning process, it is suitable for gas classification applications. The proposed gas classifier showed 99.2% classification accuracy for the UCI gas dataset and was implemented with 26,702 logic elements with Intel-Altera cyclone IV FPGA. In addition, it was verified with FPGA test system at an operating frequency of 63MHz.
https://doi.org/10.7471/ikeee.2018.22.3.700 인용 PDF KSCI

Fault Classification of Induction Motors by k-NN and SVM (k-NN과 SVM을 이용한 유도전동기 고장 분류)

Park, Seong-Mu;Lee, Dae-Jong;Gwon, Seok-Yeong;Kim, Yong-Sam;Jun, Myeong-Geun
- Proceedings of the Korean Institute of Intelligent Systems Conference
- /
- 2006.11a
- /
- pp.109-112
- /
- 2006
본 논문에서는 PCA에 의한 특징추출과 k-NN과 SVM에 기반을 계층구조의 분류기에 의한 유도전동기의 고장진단 알고리즘을 제안한다. 제안된 방법은 k-NN에 의해 선형적으로 분류 가능한 고장패턴을 분류한 후, 분류가 되지 않는 부분을 커널 함수에 의해 고차원 공간으로 입력패턴을 매핑한 후 SVM에 의해 고장을 진단하는 계층구조를 갖는다. 실험장치를 구축한 후, 다양한 부하에 대하여 몇몇의 전기적 고장과 기계적 고장 하에서 획득한 데이터를 이용하여 제안된 방법의 타당성을 검증한다.
PDF

Evaluation of the Feature Selection function of Latent Semantic Indexing(LSI) Using a kNN Classifier (잠재의미색인(LSI) 기법을 이용한 kNN 분류기의 자질 선정에 관한 연구)

Park, Boo-Young;Chung, Young-Mee
- Proceedings of the Korean Society for Information Management Conference
- /
- 2004.08a
- /
- pp.163-166
- /
- 2004
텍스트 범주화에 관한 선행연구에서 자주 사용되면서 좋은 성능을 보인 자질 선정 기법은 문헌빈도와 카이제곱 통계량 등이다. 그러나 이들은 단어 자체가 갖고 있는 모호성은 제거하지 못한다는 단점이 있다. 본 연구에서는 kNN 분류기를 이용한 범주화 실험에서 단어간의 상호 관련성이 자동적으로 유도됨으로써 단어 자체 보다는 단어의 개념을 분석하는 잠재의미색인 기법을 자질 선정 방법으로 제안한다.
PDF

Optimal k-Nearest Neighborhood Classifier Using Genetic Algorithm (유전알고리즘을 이용한 최적 k-최근접이웃 분류기)

Park, Chong-Sun;Huh, Kyun
- Communications for Statistical Applications and Methods
- /
- v.17 no.1
- /
- pp.17-27
- /
- 2010
Feature selection and feature weighting are useful techniques for improving the classification accuracy of k-Nearest Neighbor (k-NN) classifier. The main propose of feature selection and feature weighting is to reduce the number of features, by eliminating irrelevant and redundant features, while simultaneously maintaining or enhancing classification accuracy. In this paper, a novel hybrid approach is proposed for simultaneous feature selection, feature weighting and choice of k in k-NN classifier based on Genetic Algorithm. The results have indicated that the proposed algorithm is quite comparable with and superior to existing classifiers with or without feature selection and feature weighting capability.
https://doi.org/10.5351/CKSS.2010.17.1.017 인용 PDF KSCI

A Study on Statistical Feature Selection with Supervised Learning for Word Sense Disambiguation (단어 중의성 해소를 위한 지도학습 방법의 통계적 자질선정에 관한 연구)

Lee, Yong-Gu
- Journal of the Korean BIBLIA Society for library and Information Science
- /
- v.22 no.2
- /
- pp.5-25
- /
- 2011
This study aims to identify the most effective statistical feature selecting method and context window size for word sense disambiguation using supervised methods. In this study, features were selected by four different methods: information gain, document frequency, chi-square, and relevancy. The result of weight comparison showed that identifying the most appropriate features could improve word sense disambiguation performance. Information gain was the highest. SVM classifier was not affected by feature selection and showed better performance in a larger feature set and context size. Naive Bayes classifier was the best performance on 10 percent of feature set size. kNN classifier on under 10 percent of feature set size. When feature selection methods are applied to word sense disambiguation, combinations of a small set of features and larger context window size, or a large set of features and small context windows size can make best performance improvements.
https://doi.org/10.14699/kbiblia.2011.22.2.005 인용 PDF KSCI

Text Categorization Using TextRank Algorithm (TextRank 알고리즘을 이용한 문서 범주화)

Bae, Won-Sik;Cha, Jeong-Won
- Journal of KIISE:Computing Practices and Letters
- /
- v.16 no.1
- /
- pp.110-114
- /
- 2010
We describe a new method for text categorization using TextRank algorithm. Text categorization is a problem that over one pre-defined categories are assigned to a text document. TextRank algorithm is a graph-based ranking algorithm. If we consider that each word is a vertex, and co-occurrence of two adjacent words is a edge, we can get a graph from a document. After that, we find important words using TextRank algorithm from the graph and make feature which are pairs of words which are each important word and a word adjacent to the important word. We use classifiers: SVM, Na$\ddot{i}$ve Bayesian classifier, Maximum Entropy Model, and k-NN classifier. We use non-cross-posted version of 20 Newsgroups data set. In consequence, we had an improved performance in whole classifiers, and the result tells that is a possibility of TextRank algorithm in text categorization.
PDF KSCI

Search Result 90, Processing Time 0.022 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)