Browse > Article
http://dx.doi.org/10.3743/KOSIM.2012.29.2.225

A Study on the Reclassification of Author Keywords for Automatic Assignment of Descriptors  

Kim, Pan-Jun (신라대학교 문헌정보학과)
Lee, Jae-Yun (경기대학교 문헌정보학과)
Publication Information
Journal of the Korean Society for information Management / v.29, no.2, 2012 , pp. 225-246 More about this Journal
Abstract
This study purported to investigate the possibility of automatic descriptor assignment using the reclassification of author keywords in domestic scholarly databases. In the first stage, we selected optimal classifiers and parameters for the reclassification by comparing the characteristics of machine learning classifiers. In the next stage, learning the author keywords that were assigned to the selected articles on readings, the author keywords were automatically added to another set of relevant articles. We examined whether the author keyword reclassifications had the effect of vocabulary control just as descriptors collocate the documents on the same topic. The results showed the author keyword reclassification had the capability of the automatic descriptor assignment.
Keywords
automatic classification; text categorization; reclassification; vocabulary control; descriptors; author keywords;
Citations & Related Records
Times Cited By KSCI : 6  (Citation Analysis)
연도 인용수 순위
1 Zhang, Y., Tsai, F. S., & Kwee, A. T. (2011). Multilingual sentence categorization and novelty mining. Information Processing and Management, 47(5), 667-675.   DOI   ScienceOn
2 Villena-Roman, J., Collada-Perez, S., Lana-Serrano, S., & Gonzalez-Cristobal, J. C. (2011). Hybrid approach combining machine learning and a rule-based expert system for text categorization. Proceedings of the Twenty-Fourth International Florida Artificial Intelligence Research Society Conference, 323-328.
3 Voorhees, E. M., & Harman, D. K. (2005). TREC: Experiment and evaluation in information retrieval. Cambridge, Mass.: MIT Press.
4 Wang, Tai-Yue, & Chiang, Huei-Min (2007). Fuzzy support vector machine for multi-class text categorization. Information Processing and Management, 43(4), 914-929.   DOI   ScienceOn
5 Wu, Chih-Hung (2009). Behavior-based spam detection using a hybrid method of rule-based techniques and neural networks. Expert Systems with Applications, 36(1), 4321-4330.   DOI   ScienceOn
6 Yang, Y. (1999). An evaluation of statistical approaches to text categorization. Information Retrieval, 1(1-2), 69-90.   DOI   ScienceOn
7 Yang, Y., & Pedersen, J. O. (1997). A comparative study on feature selection in text categorization. Proceedings of the 14th International Conference on Machine Learning (ICML '97), 412-420.
8 Yang, Y., & Liu, Xin (1999). A re-examination for text categorization methods. Proceedings of the 22th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval ('SIGIR 99), 42-49.
9 Yu, Bo, Xu, Zong-ben, & Li, Cheng-hua (2008). Latent semantic analysis for text categorization using neural network. Knowledge-Based Systems, 21(8), 900-904.   DOI   ScienceOn
10 Zhang, J., & Yang, Y. (2003). Robustness of regularized linear classification methods in text categorization. Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '03), 190-197.
11 Miao, Yun-Qian, & Kamel, M. (2011). Pairwise optimized Rocchio algorithm for text categorization. Pattern Recognition, 32(2), 375-382.   DOI   ScienceOn
12 Mitchell, T. M. (1997). Machine learning. New York, NY: McGraw-Hill.
13 Moens, Marie-Francine (2000). Automatic indexing and abstracting of document texts. Boston: Kluwer Academic Publishers.
14 Nidhi, & Gupta, V. (2011). Recent trends in text classification techniques. International Journal of Computer Applications, 35(6), 45-51.   DOI
15 Ruiz, M. E., & Srinivasan, P. (2002). Hierarchical text categorization using neural networks. Information Retrieval, 5(1), 87-118.   DOI   ScienceOn
16 Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys, 34(1), 1-47.   DOI   ScienceOn
17 Torii, M., Yin, L., Nguyen, T., Mazumdar, C. T., Liu, H., Hartley, D. M., & Nelson, N. P. (2011). An exploratory study of a text classification framework for Internet-based surveillance of emerging epidemics. International Journal of Medical Informatics, 80(1), 56-66.   DOI   ScienceOn
18 Uguz, H. (2011). A two-stage feature selection methods for text categorization by using information gain, principal component analysis and genetic algorithm. Knowledge-Based Systems, 24(7), 1024-1032.   DOI   ScienceOn
19 Vasuki, V. & Cohen, T. (2010). Reflective random indexing for semi-automatic indexing of the biomedical literature. Journal of Biomedical Informatics, 43(5), 694-700.   DOI   ScienceOn
20 Jiang, S., Pang, G., Wu, M., & Kuang, L. (2012). An improved k-nearest-neighbor algorithm for text categorization. Expert Systems with Applications, 39(1), 1503-1509.   DOI   ScienceOn
21 Joachims, T. (1998). Text categorization with support vector machines: Learning with many relevant features. Proceedings of the 10th European Conference on Machine Learning, 137-142.
22 Khan, A., Baharudin, B., & Lee, Lam Hong (2010). A review of machine learning algorithms for text-documents classification. Journal of Advances in Information Technology, 1(1), 4-20.
23 Kumar, M. Arun, & Gopal, M. (2010). A comparison study on multiple binary-class SVM methods for unilabel text categorization. Pattern Recognition Letters, 31(11), 1437-1444.   DOI   ScienceOn
24 Lauser, B., & Hotho, A. (2003). Automatic multi-label subject indexing in a multilingual environment. Proceedings of the 7th European Conference in Research and Adavanced Technology for Digital Libraries (ECDL '03), 140-151.
25 Lewis, D. D., Schapire, R. E., Callan, J. P., & Papka, R. (1996). Training algorithms for linear text classfiers. Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '96), 298-306.
26 Li, Cheng Hua, & Park, Soon Choel (2009). An efficient document classification model using an improved back propagation neural network and singular value decomposition. Expert Systems with Applications, 36(2), 3208-3215.   DOI   ScienceOn
27 Li, Xiangdong, & Sun, Qin (2011). The review of text categorization research over Chinese Library Classification. American Journal of Engineering and Technology Research, 11(9), 2729-2734.
28 정은경 (2009). 문서범주화 성능 향상을 위한 의미기반 자질확장에 관한 연구. 정보관리학회지, 26(3), 261-278. http://dx.doi.org/10.3743/KOSIM.2009.26.3.261(Chung, Eun-Kyung (2009). A semantic-based feature expansion approach for improving the effectiveness of text categorization by using WordNet. Journal of the Korean society for Information Management, 26(3), 261-278. http://dx.doi.org/10.3743/KOSIM.2009.26.3.261)   과학기술학회마을   DOI
29 Chen, E., Lin, Y., Xiong, H., Luo, Q., & Ma, H. (2011). Exploiting probabilistic topic models to improve text categorization under class imbalance. Information Processing and Management, 47(2), 202-214. http://dx.doi.org/10.1016/j.ipm.2010.07.003   DOI   ScienceOn
30 Chen, Yao-Tsung, & Chen, Meng Chang (2011). Using chi-square statistics to measure similarities for text categorization. Expert Systems with Application, 38(4), 3085-3090. http://dx.doi.org/10.1016/j.eswa.2010.08.100   DOI   ScienceOn
31 Chung, Y., Pottenger, W. M., & Schatz, B. R. (1998). Automatic subject indexing using an associative neural network. Proceedings of the 3rd ACM International Conference on Digital Libraries (DL '98), ACM Press, 59-68.
32 Gil-Leiva, I., & Alonso-Arroyo, A. (2007). Keywords given by authors of scientific articles in database descriptors. Journal of the American Society for Information Science and Technology, 58(8), 1175-1187.   DOI   ScienceOn
33 Harish, B. S., Guru, D. S., & Manjunath, S. (2010). Representation and classification of text documents: A brief review. IJCA Special Issue on "Recent Trends in Image Processing and Pattern Recognition" RTIPPR, 2010, 110-119.
34 Hurt, C. D. (2010). Automatically generated keywords: A comparison to author-generated keywords in the sciences. Journal of Information and Organizational Sciences, 34(1), 81-88. Retrieved from https://jios.foi.hr/index.php/jios/article/view/158
35 김판준 (2006b). 로치오 알고리즘을 이용한 학술지 논문의 디스크립터 자동부여에 관한 연구. 정보관리학회지, 23(3), 69-90.(Kim, Pan Jun (2006b). A study on the automatic descriptor assignment for scientific journal articles using Rocchio algorithm. Journal of the Korean Society for Information Management, 23(3), 69-90.)
36 김판준 (2008). 용어 가중치부여 방법을 이용한 로치오 분류기의 성능 향상에 관한 연구. 정보관리학회지, 25(1), 211-233. http://dx.doi.org/10.3743/KOSIM.2008.25.1.211(Kim, Pan Jun (2008). A study on the performance improvement of Rocchio classifier with term weighting methods. Journal of the Korean Society for Information Management, 25(1), 211-233. http://dx.doi.org/10.3743/KOSIM.2008.25.1.211)
37 이재윤 (2005b). 자질 선정 기준과 가중치 할당 방식간의 관계를 고려한 문서 자동분류의 개선에 대한 연구. 한국문헌정보학회지, 39(2), 123-146.(Lee, Jae Yun (2005b). An empirical study on improving the performance of text categorization considering the relationships between feature selection criteria and weighting methods. Journal of the Korean Society for Library and Information Science, 39(2), 123-146.)   과학기술학회마을   DOI
38 김판준, 이재윤 (2007). 문헌간 유사도를 이용한 자동분류에서 미분류 문헌의 활용에 관한 연구. 정보관리학회지, 24(1), 251-271. http://dx.doi.org/10.3743/KOSIM.2007.24.1.251(Kim, Pan Jun, & Lee, Jae Yun (2007). Utilizing unlabeled documents in automatic classification with inter-document similarities. Journal of the Korean Society for Information Management, 24(1), 251-271. http://dx.doi.org/10.3743/KOSIM.2007.24.1.251)   과학기술학회마을   DOI
39 윤구호 (1999). 색인․초록. 서울: 한국도서관협회.(Yoon, Koo-ho (1999). Index & abstract. Seoul: Korean Library Association.)
40 이재윤 (2005a). 문헌간 유사도를 이용한 SVM 분류기의 문헌분류성능 향상에 관한 연구. 정보관리학회지, 22(3), 261-287.(Lee, Jae Yun (2005a). Improving the performance of SVM text categorization with inter-document similarities. Journal of the Korean Society for Information Management, 22(3), 261-287.)   과학기술학회마을   DOI
41 정영미 (2012). 정보검색연구 (증보판). 서울: 연세대학교 출판문화원.(Chung, Young Mee (2012). Research in information retrieval. Seoul: Yonsei University Press.)
42 김판준 (2006a). 기계학습을 통한 디스크립터 자동부여에 관한 연구. 정보관리학회지, 23(1), 279-299.(Kim, Pan Jun (2006a). A study on automatic assignment of descriptors using machine learning. Journal of the Korean Society for Information Management, 23(1), 279-299.)   과학기술학회마을   DOI
43 김용환, 정영미 (2012). 위키피디아를 이용한 분류자질 선정에 관한 연구. 정보관리학회지, 29(2), 155-171. http://dx.doi.org/10.3743/KOSIM.2012.29.2.155(Kim, Yong-Hwan, & Chung, Young Mee (2012). An experimental study on feature selection using Wikipedia for text categorization. Journal of the Korean Society for Information Management, 29(2), 155-171. http://dx.doi.org/10.3743/KOSIM.2012.29.2.155)