Search | Korea Science

Ko, Young-Joong;Seo, Jung-Yun
- Journal of Computing Science and Engineering
- /
- v.5 no.2
- /
- pp.150-160
- /
- 2011
Automatic text classification has a long history and many studies have been conducted in this field. In particular, many machine learning algorithms and information retrieval techniques have been applied to text classification tasks. Even though much technical progress has been made in text classification, there is still room for improvement in text classification. In this paper, we will discuss remaining issues in improving text classification. In this paper, three improvement issues are presented including automatic training data generation, noisy data treatment and term weighting and indexing, and four actual studies and their empirical results for those issues are introduced. First, the semi-supervised learning technique is applied to text classification to efficiently create training data. For effective noisy data treatment, a noisy data reduction method and a robust text classifier from noisy data are developed as a solution. Finally, the term weighting and indexing technique is revised by reflecting the importance of sentences into term weight calculation using summarization techniques.
https://doi.org/10.5626/JCSE.2011.5.2.150 인용 PDF KPUBS

조수련;사공철
- Journal of the Korean Society for information Management
- /
- v.5 no.2
- /
- pp.99-126
- /
- 1988
The purpose of this ptudy is to presenet a relevant automaitc technigue in accordance with the statistical term characteristie in a collection comprising different subjecits, by comparing and evaluating two automatic indexing technigues (Inverse Document Fregnency Weighting Technigue and Term Discrimiantion Value Weighting Technigues) intht fields of Pharmacology and Library & Information Science.
PDF

Woo, Dong-Chin
- Journal of the Korean Society for information Management
- /
- v.4 no.1
- /
- pp.47-86
- /
- 1987
The purpose of this study is to present an effective automatic indexing method of Korean texts based on statistical criteria. Titles and abstracts of the 299 documents randomly selected from ETRI's DOCUMENT data base are used as the experimental data in this study the experimental data is divided into 4 word groups and these 4 word groups are respectively analyzed and evaluated by applying 3 automatic indexing methods including Transition Phenomena of Word Occurrence, Inverse Document Frequency Weighting Technique, and Term Discrimination Weighting Technique.
PDF

노정순
- Journal of the Korean Society for information Management
- /
- v.21 no.1
- /
- pp.93-117
- /
- 2004
This study is to develop a hierarchic clustering model fur document classification and browsing in OPAC systems. Two automatic indexing techniques (with and without controlled terms), two term weighting methods (based on term frequency and binary weight), five similarity coefficients (Dice, Jaccard, Pearson, Cosine, and Squared Euclidean). and three hierarchic clustering algorithms (Between Average Linkage, Within Average Linkage, and Complete Linkage method) were tested on the document collection of 175 books and theses on library and information science. The best document clusters resulted from the Between Average Linkage or Complete Linkage method with Jaccard or Dice coefficient on the automatic indexing with controlled terms in binary vector. The clusters from Between Average Linkage with Jaccard has more likely decimal classification structure.
https://doi.org/10.3743/KOSIM.2004.21.1.093 인용 PDF

Kim, Eun-Jeong;Bae, Jong-Min
- The KIPS Transactions:PartD
- /
- v.8D no.5
- /
- pp.483-494
- /
- 2001
Most of hypertext retrieval models consider documents as independent entities. They ignore relationships between documents of link semantics. in an information retrieval system for hypertext documents, retrieval effectiveness can be improved when ling information is used. Previous link-based hypertext retrieval models ignore link information while indexing. They utilize link information to re-rank the retrieval results. Therefore they are limited that only the documents is result-set utilize link information. This paper utilizes link information when indexing. We present how to use term weighting and inLinks weighting for ranking the relevant documents. Experimental results show that recall and precision evaluation according to the link semantics and the comparison with previously link_based hypertext retrieval model.
PDF

Woo Seon-Mi;Yoo Chun-Sik;Kim Yong-Sung
- The KIPS Transactions:PartD
- /
- v.12D no.3 s.99
- /
- pp.439-446
- /
- 2005
XML is the standard that can manage systematically WWW documents and increase retrieval efficiency. Because XML documents have the information of contents and that of structure in single document, users can get more suitable retrieval result by retrieving the information of content as well as that of logical structure. In this paper, we will propose a method to calculate the weights of XML tags so that the information of XML tag is used to index decision. A proposed method creates term vector and weight vector for XML tags, and calculates weight of tag by reflecting user's retrieval behavior (user's query). And it decides the weights of index terms of XML document by reflecting the weights of tags. And we will perform an evaluation of proposed method by comparison with existing researches using weights of paragraphs.
https://doi.org/10.3745/KIPSTD.2005.12D.3.439 인용 PDF KSCI