Search | Korea Science

An Empirical Study on Improving the Performance of Text Categorization Considering the Relationships between Feature Selection Criteria and Weighting Methods (자질 선정 기준과 가중치 할당 방식간의 관계를 고려한 문서 자동분류의 개선에 대한 연구)

Lee Jae-Yun
- Journal of the Korean Society for Library and Information Science
- /
- v.39 no.2
- /
- pp.123-146
- /
- 2005
This study aims to find consistent strategies for feature selection and feature weighting methods, which can improve the effectiveness and efficiency of kNN text classifier. Feature selection criteria and feature weighting methods are as important factor as classification algorithms to achieve good performance of text categorization systems. Most of the former studies chose conflicting strategies for feature selection criteria and weighting methods. In this study, the performance of several feature selection criteria are measured considering the storage space for inverted index records and the classification time. The classification experiments in this study are conducted to examine the performance of IDF as feature selection criteria and the performance of conventional feature selection criteria, e.g. mutual information, as feature weighting methods. The results of these experiments suggest that using those measures which prefer low-frequency features as feature selection criterion and also as feature weighting method. we can increase the classification speed up to three or five times without loosing classification accuracy.
https://doi.org/10.4275/KSLIS.2005.39.2.123 인용 PDF

A Study on Feature Selection for kNN Classifier using Document Frequency and Collection Frequency (문헌빈도와 장서빈도를 이용한 kNN 분류기의 자질선정에 관한 연구)

Lee, Yong-Gu
- Journal of Korean Library and Information Science Society
- /
- v.44 no.1
- /
- pp.27-47
- /
- 2013
This study investigated the classification performance of a kNN classifier using the feature selection methods based on document frequency(DF) and collection frequency(CF). The results of the experiments, which used HKIB-20000 data, were as follows. First, the feature selection methods that used high-frequency terms and removed low-frequency terms by the CF criterion achieved better classification performance than those using the DF criterion. Second, neither DF nor CF methods performed well when low-frequency terms were selected first in the feature selection process. Last, combining CF and DF criteria did not result in better classification performance than using the single feature selection criterion of DF or CF.
PDF KSCI

A Study on Statistical Feature Selection with Supervised Learning for Word Sense Disambiguation (단어 중의성 해소를 위한 지도학습 방법의 통계적 자질선정에 관한 연구)

Lee, Yong-Gu
- Journal of the Korean BIBLIA Society for library and Information Science
- /
- v.22 no.2
- /
- pp.5-25
- /
- 2011
This study aims to identify the most effective statistical feature selecting method and context window size for word sense disambiguation using supervised methods. In this study, features were selected by four different methods: information gain, document frequency, chi-square, and relevancy. The result of weight comparison showed that identifying the most appropriate features could improve word sense disambiguation performance. Information gain was the highest. SVM classifier was not affected by feature selection and showed better performance in a larger feature set and context size. Naive Bayes classifier was the best performance on 10 percent of feature set size. kNN classifier on under 10 percent of feature set size. When feature selection methods are applied to word sense disambiguation, combinations of a small set of features and larger context window size, or a large set of features and small context windows size can make best performance improvements.
https://doi.org/10.14699/kbiblia.2011.22.2.005 인용 PDF KSCI

Improving the Performance of a Fast Text Classifier with Document-side Feature Selection (문서측 자질선정을 이용한 고속 문서분류기의 성능향상에 관한 연구)

Lee, Jae-Yun
- Journal of Information Management
- /
- v.36 no.4
- /
- pp.51-69
- /
- 2005
High-speed classification method becomes an important research issue in text categorization systems. A fast text categorization technique, named feature value voting, is introduced recently on the text categorization problems. But the classification accuracy of this technique is not good as its classification speed. We present a novel approach for feature selection, named document-side feature selection, and apply it to feature value voting method. In this approach, there is no feature selection process in learning phase; but realtime feature selection is executed in classification phase. Our results show that feature value voting with document-side feature selection can allow fast and accurate text classification system, which seems to be competitive in classification performance with Support Vector Machines, the state-of-the-art text categorization algorithms.
https://doi.org/10.1633/JIM.2005.36.4.051 인용 PDF

Performance Analysis of Feature Detection Methods for Topology-Based Feature Description (토폴로지 기반 특징 기술을 위한 특징 검출 방법의 성능 분석)

Park, Han-Hoon;Moon, Kwang-Seok
- Journal of the Institute of Convergence Signal Processing
- /
- v.16 no.2
- /
- pp.44-49
- /
- 2015
When the scene has less texture or when camera pose largely changes, the existing texture-based feature tracking methods are not reliable. Topology-based feature description methods, which use the geometric relationship between features such as LLAH, is a good alternative. However, they require feature detection methods with high performance. As a basic study on developing an effective feature detection method for topology-based feature description, this paper aims at examining their applicability to topology-based feature description by analyzing the repeatability of several feature detection methods that are included in the OpenCV library. Experimental results show that FAST outperforms the others.
PDF KSCI

A Comparative Study of Feature Selection Methods for Korean Web Documents Clustering (한글 웹 문서 클러스터링 성능향상을 위한 자질선정 기법 비교 연구)

Kim Young-Gi
- Journal of the Korean Society for Library and Information Science
- /
- v.39 no.1
- /
- pp.45-58
- /
- 2005
This Paper is a comparative study of feature selection methods for Korean web documents clustering. First, we focused on how the term feature and the co-link of web documents affect clustering performance. We clustered web documents by native term feature, co-link and both, and compared the output results with the originally allocated category. And we selected term features for each category using $X^2$, Information Gain (IG), and Mutual Information (MI) from training documents, and applied these features to other experimental documents. In addition we suggested a new method named Max Feature Selection, which selects terms that have the maximum count for a category in each experimental document, and applied $X^2$ (or MI or IG) values to each term instead of term frequency of documents, and clustered them. In the results, $X^2$ shows a better performance than IG or MI, but the difference appears to be slight. But when we applied the Max Feature Selection Method, the clustering Performance improved notably. Max Feature Selection is a simple but effective means of feature space reduction and shows powerful performance for Korean web document clustering.
https://doi.org/10.4275/KSLIS.2005.39.1.045 인용 PDF

A Study on Patent Literature Classification Using Distributed Representation of Technical Terms (기술용어 분산표현을 활용한 특허문헌 분류에 관한 연구)

Choi, Yunsoo;Choi, Sung-Pil
- Journal of the Korean Society for Library and Information Science
- /
- v.53 no.2
- /
- pp.179-199
- /
- 2019
In this paper, we propose optimal methodologies for classifying patent literature by examining various feature extraction methods, machine learning and deep learning models, and provide optimal performance through experiments. We compared the traditional BoW method and a distributed representation method (word embedding vector) as a feature extraction, and compared the morphological analysis and multi gram as the method of constructing the document collection. In addition, classification performance was verified using traditional machine learning model and deep learning model. Experimental results show that the best performance is achieved when we apply the deep learning model with distributed representation and morphological analysis based feature extraction. In Section, Class and Subclass classification experiments, We improved the performance by 5.71%, 18.84% and 21.53%, respectively, compared with traditional classification methods.
https://doi.org/10.4275/KSLIS.2019.53.2.179 인용 PDF KSCI HTML

The Study on the Effective Automatic Classification of Internet Document Using the Machine Learning (기계학습을 기반으로 한 인터넷 학술문서의 효과적 자동분류에 관한 연구)

노영희
- Journal of Korean Library and Information Science Society
- /
- v.32 no.3
- /
- pp.307-330
- /
- 2001
This study experimented the performance of categorization methods using the kNN classifier. Most sample based automatic text categorization techniques like the kNN classifier reduces the feature set of the training documents. We sought to find out which percentage reductions in the feature set would result in high performances. In addition, the kNN classifier has to find the k number of training documents most similar to the test documents in the training documents. We sought to verify the most appropriate k value through experiments.
PDF

A Qualitative Study on the Period-Specific Changes of Job Factors and Performance Features in Academic Libraries (질적 분석을 통한 대학도서관 업무의 시대별 수행 형태 및 요소 변화에 관한 연구)

Cho, Chul-Hyun;Noh, Dong-Jo
- Journal of the Korean Society for information Management
- /
- v.32 no.4
- /
- pp.137-165
- /
- 2015
This study aimed to investigate the period-specific changes (Library 1.0, Library 2.0, Library 3.0 Period) of job factors and performance features in academic libraries. For this, the study categorized an academic library's job into five dimensions: 1) library administration 2) collection development and management 3) information organization 4) information services and 5) information system development and management, After the categorized library's job was defined in detail, the Delphi survey was conducted twice on librarians and professors of library and information science. The result showed that there were many changes in job factors and performance features in academic libraries towards the period of library 2.0 characterized by user participation, sharing and openness and into library 3.0 characterized by social network and semantic web. Library 3.0 is likely to bring about a significant change in user services with ever changing technological advances stemming from library 2.0, such as mobile services, RFID and NFC etc. The finding of the study suggest that library systems need to be continually upgraded in the period of library 3.0.
https://doi.org/10.3743/KOSIM.2015.32.4.137 인용 PDF KSCI

A Experimental Study on the Development of a Book Recommendation System Using Automatic Classification, Based on the Personality Type (자동분류기반 성격 유형별 도서추천시스템 개발을 위한 실험적 연구)

Cho, Hyun-Yang
- Journal of Korean Library and Information Science Society
- /
- v.48 no.2
- /
- pp.215-236
- /
- 2017
The purpose of this study is to develop an automatic classification system for recommending appropriate books of 9 enneagram personality types, using book information data reviewed by librarians. Data used for this study are book review of 501 recommended titles for children and young adults from National Library for Children and Young Adults. This study is implemented on the assumption that most people prefer different types of books, depending on their preference or personality type. Performance test for two different types of machine learning models, nonlinear kernel and linear kernel, composed of 360 clustering models with 6 different types of index term weighting and feature selections, and 10 feature selection critical mass were experimented. It is appeared that LIBLINEAR has better performance than that of LibSVM(RBF kernel). Although the performance of the developed system in this study is relatively below expectations, and the high level of difficulty in personality type base classification take into consideration, it is meaningful as a result of early stage of the experiment.
https://doi.org/10.16981/kliss.48.201706.215 인용 PDF KSCI

Search Result 41, Processing Time 0.023 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)