Browse > Article
http://dx.doi.org/10.1633/JISTaP.2017.5.3.3

Enhancing the Narrow-down Approach to Large-scale Hierarchical Text Classification with Category Path Information  

Oh, Heung-Seon (Korea Institute of Science and Technology Information)
Jung, Yuchul (Kumoh National Institute of Technology (KIT))
Publication Information
Journal of Information Science Theory and Practice / v.5, no.3, 2017 , pp. 31-47 More about this Journal
Abstract
The narrow-down approach, separately composed of search and classification stages, is an effective way of dealing with large-scale hierarchical text classification. Recent approaches introduce methods of incorporating global, local, and path information extracted from web taxonomies in the classification stage. Meanwhile, in the case of utilizing path information, there have been few efforts to address existing limitations and develop more sophisticated methods. In this paper, we propose an expansion method to effectively exploit category path information based on the observation that the existing method is exposed to a term mismatch problem and low discrimination power due to insufficient path information. The key idea of our method is to utilize relevant information not presented on category paths by adding more useful words. We evaluate the effectiveness of our method on state-of-the art narrow-down methods and report the results with in-depth analysis.
Keywords
Hierarchical text classification; Query expansion; Narrow-down approach;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Bennett, P. N., & Nguyen, N. (2009). Refined experts: Improving classification in large taxonomies. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval (pp. 11-18). ACM. Retrieved from http://portal.acm.org/citation.cfm?id=1571946
2 Broder, A., Ciccolo, P., Gabrilovich, E., Josifovski, V., Metzler, D., Riedel, L., & Yuan, J. (2009). Online expansion of rare queries for sponsored search. In Proceedings of the 18th international conference on World wide web - WWW '09 (pp. 511-520). New York: ACM Press. http://doi.org/10.1145/1526709.1526778   DOI
3 Broder, A., Fontoura, M., Josifovski, V., & Riedel, L. (2007). A semantic approach to contextual advertising. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '07 (pp. 559-566). New York: ACM Press. http://doi.org/10.1145/1277741.1277837   DOI
4 Cai, L., & Hofmann, T. (2004). Hierarchical document categorization with support vector machines categories and subject descriptors. In Proceedings of the thirteenth ACM international conference on Information and knowledge management (pp. 78-87). New York: ACM Press. http://doi.org/10.1145/1031171.1031186   DOI
5 Cai, L., Zhou, G., Liu, K., & Zhao, J. (2011). Large-scale question classification in cQA by leveraging Wikipedia semantic knowledge. In CIKM'11 (pp. 1321-1330). New York: ACM Press. http://doi.org/10.1145/2063576.2063768   DOI
6 Wang, X. L., Zhao, H., & Lu, B. L. (2014). A meta-top-down method for large-scale hierarchical classification. IEEE Transactions on Knowledge and Data Engineering, 26(3), 500-513. http://doi.org/10.1109/TKDE.2013.30   DOI
7 Xue, G. R., Xing, D., Yang, Q., & Yu, Y. (2008). Deep classification in large-scale text hierarchies. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval (pp. 619-626). New York: ACM Press. http://doi.org/10.1145/1390334.1390440   DOI
8 Zhai, C., & Lafferty, J. (2004). A study of smoothing methods for language models applied to information retrieval. ACM Transactions on Information Systems, 22(2), 179-214. http://doi.org/10.1145/984321.984322   DOI
9 Zhang, B., Li, H., Liu, Y., Ji, L., Xi, W., Fan, W., & Ma, W.-Y. (2005). Improving web search results using affinity graph. In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 504-511). New York: ACM Press. http://doi.org/10.1145/1076034.1076120   DOI
10 Zhao, L., & Callan, J. (2012). Automatic term mismatch diagnosis for selective query expansion. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval - SIGIR '12 (pp. 515-524). New York: ACM Press. http://doi.org/10.1145/2348283.2348354   DOI
11 Na, S. H., Kang, I. S., & Lee, J. H. (2007). Parsimonious translation models for information retrieval. Information Processing and Management, 43(1), 121-145. http://doi.org/10.1016/j.ipm.2006.04.005   DOI
12 Lafferty, J., & Zhai, C. (2001). Document language models, query models, and risk minimization for information retrieval. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '01 (pp. 111-119). New York: ACM Press. http://doi.org/10.1145/383952.383970   DOI
13 Liu, T.-Y., Yang, Y., Wan, H., Zeng, H.-J., Chen, Z., & Ma, W.-Y. (2005, June 1). Support vector machines classification with a very large-scale taxonomy. ACM SIGKDD Explorations Newsletter. ACM. http://doi.org/10.1145/1089815.1089821   DOI
14 McCallum, A., Rosenfeld, R., Mitchell, T. M., & Ng, A. Y. A. Y. (1998). Improving text classification by shrinkage in a hierarchy of classes. In Proceedings of the Fifteenth International Conference on Machine Learning (pp. 359-367). Citeseer. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.73.5412&rep=rep1&type=pdf
15 Oh, H.-S., Choi, Y., & Myaeng, S.-H. (2010). Combining global and local information for enhanced deep classification. In Proceedings of the 2010 ACM Symposium on Applied Computing - SAC '10 (pp. 1760-1767). New York: ACM Press. http://doi.org/10.1145/1774088.1774463   DOI
16 Oh, H.-S., Choi, Y., & Myaeng, S.-H. (2011). Text classification for a large-scale taxonomy using dynamically mixed local and global models for a node. In Proceedings of the 33rd European conference on Advances in information retrieval (pp. 7-18). Springer. http://doi.org/10.1007/978-3-642-20161-5_4   DOI
17 Gopal, S., Yang, Y., & Niculescu-mizil, A. (2012). Regularization framework for large scale hierarchical classification. In Large Scale Hierarchical Classification, ECML/PKDD Discovery Challenge Workshop.
18 Bai, J., Song, D., Bruza, P., Nie, J.-Y., & Cao, G. (2005). Query expansion using term relationships in language models for information retrieval. In Proceedings of the 14th ACM international conference on Information and knowledge management (pp. 688-695). New York: ACM. http://doi.org/10.1145/1099554.1099725   DOI
19 Custis, T., & Al-Kofahi, K. (2007). A new approach for evaluating query expansion. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '07 (pp. 575-582). New York: ACM Press. http://doi.org/10.1145/1277741.1277840   DOI
20 Gopal, S., & Yang, Y. (2013). Recursive regularization for large-scale classification with hierarchical and graphical dependencies. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '13 (pp. 257-265). New York: ACM Press. http://doi.org/10.1145/2487575.2487644   DOI
21 Sebastiani, F. (2001). Machine learning in automated text categorization. ACM Computing Surveys, 34(1), 1-47. http://doi.org/10.1145/505282.505283   DOI
22 Carpineto, C., & Romano, G. (2012). A survey of automatic query expansion in information retrieval. ACM Computing Surveys, 44(1), 1-50. http://doi.org/10.1145/2071389.2071390   DOI
23 Chan, W., Yang, W., Tang, J., Du, J., Zhou, X., & Wang, W. (2013). Community question topic categorization via hierarchical kernelized classification. In Proceedings of the 22nd ACM international conference on Conference on information & knowledge management - CIKM '13 (pp. 959-968). New York: ACM Press. http://doi.org/10.1145/2505515.2505676   DOI
24 Schutze, H., & Pedersen, J. O. (1997). A cooccurrence-based thesaurus and two applications to information retrieval. Information Processing & Management, 33(3), 307-318. http://doi.org/10.1016/S0306-4573(96)00068-4   DOI
25 Sokolov, A., & Ben-Hur, A. (2010). Hierarchical classification of gene ontology terms using the GOstruct method. Journal of Bioinformatics and Computational Biology, 8(2), 357-76. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/20401950   DOI
26 Sun, A. S. A., & Lim, E.-P. L. E.-P. (2001). Hierarchical text classification and evaluation. In Proceedings 2001 IEEE International Conference on Data Mining (pp. 521-528). IEEE Computer Society. http://doi.org/10.1109/ICDM.2001.989560   DOI
27 Wang, X.-L., & Lu, B.-L. (2010). Flatten hierarchies for large-scale hierarchical text categorization. In 2010 Fifth International Conference on Digital Information Management (ICDIM) (pp. 139-144). IEEE. http://doi.org/10.1109/ICDIM.2010.5664247   DOI
28 Chen, Y., Xue, G.-R., & Yu, Y. (2008). Advertising keyword suggestion based on concept hierarchy. In Proceedings of the international conference on Web search and web data mining - WSDM '08 (pp. 251-260). New York: ACM Press. http://doi.org/10.1145/1341531.1341564   DOI
29 Hiemstra, D., Robertson, S., & Zaragoza, H. (2004). Parsimonious language models for information retrieval. In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 178-185). New York: ACM Press. http://doi.org/10.1145/1008992.1009025   DOI
30 Karimzadehgan, M., & Zhai, C. (2010). Estimation of statistical translation models based on mutual information for ad hoc information retrieval. In Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval - SIGIR '10 (pp. 323-330). New York: ACM Press. http://doi.org/10.1145/1835449.1835505   DOI
31 Koller, D., & Sahami, M. (1997). Hierarchically classifying documents using very few words. In Proceedings of the 4th International Conference on Machine Learning (pp. 170-178). Morgan Kaufmann Publishers Inc. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.31.2455&rep=rep1&type=pdf
32 Kurland, O., & Lee, L. (2006). PageRank without hyperlinks: Structural re-ranking using links induced by language models. In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '05 (pp. 306-313). New York: ACM Press. http://doi.org/10.1145/1076034.1076087   DOI
33 Labrou, Y., & Finin, T. (1999). Yahoo! as an ontology. In Proceedings of the eighth international conference on Information and knowledge management - CIKM '99 (pp. 180-187). New York: ACM Press. http://doi.org/10.1145/319950.319976   DOI
34 Robertson, S., & Walker, S. (1994). Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval. In Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 232-241). New York: Springer-Verlag. Retrieved from http://dl.acm.org/citation.cfm?id=188490.188561
35 Oh, H.-S., & Jung, Y. (2014). External methods to address limitations of using global information on the narrow-down approach for hierarchical text classification. Journal of Information Science, 40(5), 688-708. http://doi.org/10.1177/0165551514544626   DOI
36 Oh, H.-S., & Myaeng, S.-H. (2014). Utilizing global and path information with language modelling for hierarchical text classification. Journal of Information Science, 40(2), 127-145. http://doi.org/10.1177/0165551513507415   DOI
37 Ponte, J. M., & Croft, W. B. (1998). A language modeling approach to information retrieval. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '98 (pp. 275-281). New York: ACM Press. http://doi.org/10.1145/290941.291008   DOI
38 Sasaki, M., & Kita, K. (1998). Rule-based text categorization using hierarchical categories. In IEEE International Conference on Systems, Man, and Cybernetics (Vol. 3, pp. 2827-2830). IEEE. http://doi.org/10.1109/ICSMC.1998.725090   DOI