[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.3743/KOSIM.2012.29.2.225

A Study on the Reclassification of Author Keywords for Automatic Assignment of Descriptors

Kim, Pan-Jun (신라대학교 문헌정보학과)
Lee, Jae-Yun (경기대학교 문헌정보학과)

Publication Information

Journal of the Korean Society for information Management / v.29, no.2, 2012 , pp. 225-246 More about this Journal

Abstract

This study purported to investigate the possibility of automatic descriptor assignment using the reclassification of author keywords in domestic scholarly databases. In the first stage, we selected optimal classifiers and parameters for the reclassification by comparing the characteristics of machine learning classifiers. In the next stage, learning the author keywords that were assigned to the selected articles on readings, the author keywords were automatically added to another set of relevant articles. We examined whether the author keyword reclassifications had the effect of vocabulary control just as descriptors collocate the documents on the same topic. The results showed the author keyword reclassification had the capability of the automatic descriptor assignment.

Keywords

automatic classification; text categorization; reclassification; vocabulary control; descriptors; author keywords;

Citations & Related Records

Times Cited By KSCI : 6 (Citation Analysis)

Reference
Cited By KSCI

1	Zhang, Y., Tsai, F. S., & Kwee, A. T. (2011). Multilingual sentence categorization and novelty mining. Information Processing and Management, 47(5), 667-675. DOI ScienceOn
2	Villena-Roman, J., Collada-Perez, S., Lana-Serrano, S., & Gonzalez-Cristobal, J. C. (2011). Hybrid approach combining machine learning and a rule-based expert system for text categorization. Proceedings of the Twenty-Fourth International Florida Artificial Intelligence Research Society Conference, 323-328.
3	Voorhees, E. M., & Harman, D. K. (2005). TREC: Experiment and evaluation in information retrieval. Cambridge, Mass.: MIT Press.
4	Wang, Tai-Yue, & Chiang, Huei-Min (2007). Fuzzy support vector machine for multi-class text categorization. Information Processing and Management, 43(4), 914-929. DOI ScienceOn
5	Wu, Chih-Hung (2009). Behavior-based spam detection using a hybrid method of rule-based techniques and neural networks. Expert Systems with Applications, 36(1), 4321-4330. DOI ScienceOn
6	Yang, Y. (1999). An evaluation of statistical approaches to text categorization. Information Retrieval, 1(1-2), 69-90. DOI ScienceOn
7	Yang, Y., & Pedersen, J. O. (1997). A comparative study on feature selection in text categorization. Proceedings of the 14th International Conference on Machine Learning (ICML '97), 412-420.
8	Yang, Y., & Liu, Xin (1999). A re-examination for text categorization methods. Proceedings of the 22th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval ('SIGIR 99), 42-49.
9	Yu, Bo, Xu, Zong-ben, & Li, Cheng-hua (2008). Latent semantic analysis for text categorization using neural network. Knowledge-Based Systems, 21(8), 900-904. DOI ScienceOn
10	Zhang, J., & Yang, Y. (2003). Robustness of regularized linear classification methods in text categorization. Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '03), 190-197.
11	Miao, Yun-Qian, & Kamel, M. (2011). Pairwise optimized Rocchio algorithm for text categorization. Pattern Recognition, 32(2), 375-382. DOI ScienceOn
12	Mitchell, T. M. (1997). Machine learning. New York, NY: McGraw-Hill.
13	Moens, Marie-Francine (2000). Automatic indexing and abstracting of document texts. Boston: Kluwer Academic Publishers.
14	Nidhi, & Gupta, V. (2011). Recent trends in text classification techniques. International Journal of Computer Applications, 35(6), 45-51. DOI
15	Ruiz, M. E., & Srinivasan, P. (2002). Hierarchical text categorization using neural networks. Information Retrieval, 5(1), 87-118. DOI ScienceOn
16	Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys, 34(1), 1-47. DOI ScienceOn
17	Torii, M., Yin, L., Nguyen, T., Mazumdar, C. T., Liu, H., Hartley, D. M., & Nelson, N. P. (2011). An exploratory study of a text classification framework for Internet-based surveillance of emerging epidemics. International Journal of Medical Informatics, 80(1), 56-66. DOI ScienceOn
18	Uguz, H. (2011). A two-stage feature selection methods for text categorization by using information gain, principal component analysis and genetic algorithm. Knowledge-Based Systems, 24(7), 1024-1032. DOI ScienceOn
19	Vasuki, V. & Cohen, T. (2010). Reflective random indexing for semi-automatic indexing of the biomedical literature. Journal of Biomedical Informatics, 43(5), 694-700. DOI ScienceOn
20	Jiang, S., Pang, G., Wu, M., & Kuang, L. (2012). An improved k-nearest-neighbor algorithm for text categorization. Expert Systems with Applications, 39(1), 1503-1509. DOI ScienceOn
21	Joachims, T. (1998). Text categorization with support vector machines: Learning with many relevant features. Proceedings of the 10th European Conference on Machine Learning, 137-142.
22	Khan, A., Baharudin, B., & Lee, Lam Hong (2010). A review of machine learning algorithms for text-documents classification. Journal of Advances in Information Technology, 1(1), 4-20.
23	Kumar, M. Arun, & Gopal, M. (2010). A comparison study on multiple binary-class SVM methods for unilabel text categorization. Pattern Recognition Letters, 31(11), 1437-1444. DOI ScienceOn
24	Lauser, B., & Hotho, A. (2003). Automatic multi-label subject indexing in a multilingual environment. Proceedings of the 7th European Conference in Research and Adavanced Technology for Digital Libraries (ECDL '03), 140-151.
25	Lewis, D. D., Schapire, R. E., Callan, J. P., & Papka, R. (1996). Training algorithms for linear text classfiers. Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '96), 298-306.
26	Li, Cheng Hua, & Park, Soon Choel (2009). An efficient document classification model using an improved back propagation neural network and singular value decomposition. Expert Systems with Applications, 36(2), 3208-3215. DOI ScienceOn
27	Li, Xiangdong, & Sun, Qin (2011). The review of text categorization research over Chinese Library Classification. American Journal of Engineering and Technology Research, 11(9), 2729-2734.
28	정은경 (2009). 문서범주화 성능 향상을 위한 의미기반 자질확장에 관한 연구. 정보관리학회지, 26(3), 261-278. http://dx.doi.org/10.3743/KOSIM.2009.26.3.261(Chung, Eun-Kyung (2009). A semantic-based feature expansion approach for improving the effectiveness of text categorization by using WordNet. Journal of the Korean society for Information Management, 26(3), 261-278. http://dx.doi.org/10.3743/KOSIM.2009.26.3.261) 과학기술학회마을 DOI
29	Chen, E., Lin, Y., Xiong, H., Luo, Q., & Ma, H. (2011). Exploiting probabilistic topic models to improve text categorization under class imbalance. Information Processing and Management, 47(2), 202-214. http://dx.doi.org/10.1016/j.ipm.2010.07.003 DOI ScienceOn
30	Chen, Yao-Tsung, & Chen, Meng Chang (2011). Using chi-square statistics to measure similarities for text categorization. Expert Systems with Application, 38(4), 3085-3090. http://dx.doi.org/10.1016/j.eswa.2010.08.100 DOI ScienceOn
31	Chung, Y., Pottenger, W. M., & Schatz, B. R. (1998). Automatic subject indexing using an associative neural network. Proceedings of the 3rd ACM International Conference on Digital Libraries (DL '98), ACM Press, 59-68.
32	Gil-Leiva, I., & Alonso-Arroyo, A. (2007). Keywords given by authors of scientific articles in database descriptors. Journal of the American Society for Information Science and Technology, 58(8), 1175-1187. DOI ScienceOn
33	Harish, B. S., Guru, D. S., & Manjunath, S. (2010). Representation and classification of text documents: A brief review. IJCA Special Issue on "Recent Trends in Image Processing and Pattern Recognition" RTIPPR, 2010, 110-119.
34	Hurt, C. D. (2010). Automatically generated keywords: A comparison to author-generated keywords in the sciences. Journal of Information and Organizational Sciences, 34(1), 81-88. Retrieved from https://jios.foi.hr/index.php/jios/article/view/158
35	김판준 (2006b). 로치오 알고리즘을 이용한 학술지 논문의 디스크립터 자동부여에 관한 연구. 정보관리학회지, 23(3), 69-90.(Kim, Pan Jun (2006b). A study on the automatic descriptor assignment for scientific journal articles using Rocchio algorithm. Journal of the Korean Society for Information Management, 23(3), 69-90.)
36	김판준 (2008). 용어 가중치부여 방법을 이용한 로치오 분류기의 성능 향상에 관한 연구. 정보관리학회지, 25(1), 211-233. http://dx.doi.org/10.3743/KOSIM.2008.25.1.211(Kim, Pan Jun (2008). A study on the performance improvement of Rocchio classifier with term weighting methods. Journal of the Korean Society for Information Management, 25(1), 211-233. http://dx.doi.org/10.3743/KOSIM.2008.25.1.211)
37	이재윤 (2005b). 자질 선정 기준과 가중치 할당 방식간의 관계를 고려한 문서 자동분류의 개선에 대한 연구. 한국문헌정보학회지, 39(2), 123-146.(Lee, Jae Yun (2005b). An empirical study on improving the performance of text categorization considering the relationships between feature selection criteria and weighting methods. Journal of the Korean Society for Library and Information Science, 39(2), 123-146.) 과학기술학회마을 DOI
38	김판준, 이재윤 (2007). 문헌간 유사도를 이용한 자동분류에서 미분류 문헌의 활용에 관한 연구. 정보관리학회지, 24(1), 251-271. http://dx.doi.org/10.3743/KOSIM.2007.24.1.251(Kim, Pan Jun, & Lee, Jae Yun (2007). Utilizing unlabeled documents in automatic classification with inter-document similarities. Journal of the Korean Society for Information Management, 24(1), 251-271. http://dx.doi.org/10.3743/KOSIM.2007.24.1.251) 과학기술학회마을 DOI
39	윤구호 (1999). 색인․초록. 서울: 한국도서관협회.(Yoon, Koo-ho (1999). Index & abstract. Seoul: Korean Library Association.)
40	이재윤 (2005a). 문헌간 유사도를 이용한 SVM 분류기의 문헌분류성능 향상에 관한 연구. 정보관리학회지, 22(3), 261-287.(Lee, Jae Yun (2005a). Improving the performance of SVM text categorization with inter-document similarities. Journal of the Korean Society for Information Management, 22(3), 261-287.) 과학기술학회마을 DOI
41	정영미 (2012). 정보검색연구 (증보판). 서울: 연세대학교 출판문화원.(Chung, Young Mee (2012). Research in information retrieval. Seoul: Yonsei University Press.)
42	김판준 (2006a). 기계학습을 통한 디스크립터 자동부여에 관한 연구. 정보관리학회지, 23(1), 279-299.(Kim, Pan Jun (2006a). A study on automatic assignment of descriptors using machine learning. Journal of the Korean Society for Information Management, 23(1), 279-299.) 과학기술학회마을 DOI
43	김용환, 정영미 (2012). 위키피디아를 이용한 분류자질 선정에 관한 연구. 정보관리학회지, 29(2), 155-171. http://dx.doi.org/10.3743/KOSIM.2012.29.2.155(Kim, Yong-Hwan, & Chung, Young Mee (2012). An experimental study on feature selection using Wikipedia for text categorization. Journal of the Korean Society for Information Management, 29(2), 155-171. http://dx.doi.org/10.3743/KOSIM.2012.29.2.155)

1	A Study on the Correlation between the Appearance Frequency of Author Keyword and the Number of Citation in the Humanities and Social Science Journal Articles of the Korea Citation Index (KCI) / [Ko, Young Man;Song, Min-Sun;Kim, Bee-Yeon;Min, Hye-Ryoung;] / Journal of the Korean Society for information Management
2	A Study on the Factors Influencing Semantic Relation in Building a Structured Glossary / [Kwon, Sun-Young;] / Journal of the Korean Society for Library and Information Science
3	A Study on the Application to Network Analysis on the Importance of Author Keyword based on the Position of Keyword / [Kwon, Sun-Young;] / Journal of the Korean Society for information Management
4	An Experimental Study on the Performance Improvement of Automatic Classification for the Articles of Korean Journals Based on Controlled Keywords in International Database / [Kim, Pan Jun;Lee, Jae Yun;] / Journal of the Korean Society for Library and Information Science

3	Pan Jun Kim. (2014) Journal of the Korean Society for Library and Information Science An Experimental Study on the Performance Improvement of Automatic Classification for the Articles of Korean Journals Based on Controlled Keywords in International Database / 48 (3) , 491
2	Sun-Young Kwon. (2014) Journal of the Korean Society for information Management A Study on the Application to Network Analysis on the Importance of Author Keyword based on the Position of Keyword / 31 (2) , 121
2	Young Man Ko. (2013) Journal of the Korean Society for information Management A Study on the Correlation between the Appearance Frequency of Author Keyword and the Number of Citation in the Humanities and Social Science Journal Articles of the Korea Citation Index (KCI) / 30 (2) , 227
2	Sun-Young Kwon. (2014) Journal of the Korean Society for Library and Information Science A Study on the Factors Influencing Semantic Relation in Building a Structured Glossary / 48 (2) , 353

KSCI

A Study on the Reclassification of Author Keywords for Automatic Assignment of Descriptors 디스크립터 자동 할당을 위한 저자키워드의 재분류에 관한 실험적 연구

A Study on the Reclassification of Author Keywords for Automatic Assignment of Descriptors