Browse > Article
http://dx.doi.org/10.7838/jsebs.2013.18.2.095

Improving Hypertext Classification Systems through WordNet-based Feature Abstraction  

Roh, Jun-Ho (School of Electrical and Computer Engineering, University of Seoul)
Kim, Han-Joon (School of Electrical and Computer Engineering, University of Seoul)
Chang, Jae-Young (Department of Computer Engineering, Hansung University)
Publication Information
The Journal of Society for e-Business Studies / v.18, no.2, 2013 , pp. 95-110 More about this Journal
Abstract
This paper presents a novel feature engineering technique that can improve the conventional machine learning-based text classification systems. The proposed method extends the initial set of features by using hyperlink relationships in order to effectively categorize hypertext web documents. Web documents are connected to each other through hyperlinks, and in many cases hyperlinks exist among highly related documents. Such hyperlink relationships can be used to enhance the quality of features which consist of classification models. The basic idea of the proposed method is to generate a sort of ed concept feature which consists of a few raw feature words; for this, the method computes the semantic similarity between a target document and its neighbor documents by utilizing hierarchical relationships in the WordNet ontology. In developing classification models, the ed concept features are equated with other raw features, and they can play a great role in developing more accurate classification models. Through the extensive experiments with the Web-KB test collection, we prove that the proposed methods outperform the conventional ones.
Keywords
Text Classification; Machine Learning; WordNet; Hypertext; World Wide Web; Feature Abstraction;
Citations & Related Records
Times Cited By KSCI : 4  (Citation Analysis)
연도 인용수 순위
1 RiTa.WordNet, A WordNet library for Java/Processing, http://rednoise.org/rita/wordnet/documentation/index.htm.
2 Scott, S. and Matwin, S., "Feature engineering for text classification," Proceedings of 16th International Conference on Machine Learning, pp. 379-388, 1999.
3 Utard, H. and Fürnkranz, J., "Link-Local Features for Hypertext Classification," Semantics, Web and Mining : Joint International Workshops, Lecture Notes in Computer Science, Vol. 4289, pp. 51-64, 2005.
4 Zhang, B., Yan, J., Chen, Z., Liu, W., Bai, F., and Chien, L., "Text representation: from vector to tensor," Proceedings of 5th IEEE International Conference on Data Mining, pp. 725-728, 2005.
5 Zhao, Y., Karypis, G., and Fayyad, U., "Hierarchical Clustering Algorithms for Document Datasets," Data Mining and Knowledge Discovery, Vol. 10, No. 2, pp. 141-168, 2005.   DOI   ScienceOn
6 Chakrabarti, S., Dom, B., and Indyk, P., "Enhanced hypertext categorization using hyperlinks," Proceedings of the ACM SIGMOD International Conference, pp. 307-318, 1998.
7 Chang, J. Y., "A Sentiment Analysis Algorithm for Automatic Product Reviews Classification in On-Line Shopping Mall," The Journal of Society for e-Business Studies, Vol. 14, No. 4, pp. 19-33, 2009.   과학기술학회마을
8 Elberrichi, Z., Rahmoun, A., and Bentaalah, M. A., "Using WordNet for Text Categorization," The International Arab Journal of Information Technology, Vol. 5, No. 1, pp. 16-24, 2008.
9 Jiang, J. and Conrath, D., "Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy," Proceedings of International Conference on Research in Computational Linguistics, pp. 19-33, 1997.
10 Lee, J. W., Park, S. C., Lee, S. K., Park, J. H., Kim, H. J., and Lee, S. G., "Semantic Search and Recommendation of e-Catalog Documents through Concept Network," The Journal of Society for e-Business Studies, Vol. 15, No. 3, pp. 131-145, 2010.   과학기술학회마을
11 Lu, Z., Liu, Y., Zhao, S., and Chen, X., "Study on Feature Selection and Weighting Based on Synonym Merge in Text Categorization," Proceedings of the 2nd International Conference on Future Networks, pp. 105-109, 2010.
12 MALLET, MAchine Learning for Language Toolkit, http://mallet.cs.umass.edu/.
13 Oh, S. J., Ahn, J. H., and Park, J. S., "Ontology Selection Ranking Model based on Semantic Similarity Approach," The Journal of Society for e-Business Studies, Vol. 14, No. 2, pp. 95-116, 2009.   과학기술학회마을
14 Mansuy, T. and Hilderman, R., "Evaluating WordNet Features in Text Classification Models," Proceedings of the 19th International Florida Artificial Intelligence Research Symposium, pp. 568-573, 2006.
15 Mitchell, T. M., Machine Learning, McGraw-Hill, 1997.
16 Oh, H. J. and Myaeng, S. H., "A Hypertext Categorization Method using Incrementally Computable Class Link Information," Journal of Korean Institute of Information Scientist and Engineers, Vol. 29, No. 7-8, pp. 498-509, 2002.   과학기술학회마을
17 Priss, U., "Formal Concept Analysis in Information Science," Annual Review of Information Science and Technology, Vol. 40, No. 1, pp. 521-543, 2006.