Browse > Article
http://dx.doi.org/10.5391/JKIIS.2011.21.5.618

Kernelized Structure Feature for Discriminating Meaningful Table from Decorative Table  

Son, Jeong-Woo (경북대학교 IT대학 컴퓨터공학과)
Go, Jun-Ho (경북대학교 IT대학 컴퓨터공학과)
Park, Seong-Bae (경북대학교 IT대학 컴퓨터공학과)
Kim, Kweon-Yang (경일대학교 컴퓨터공학과)
Publication Information
Journal of the Korean Institute of Intelligent Systems / v.21, no.5, 2011 , pp. 618-623 More about this Journal
Abstract
This paper proposes a novel method to discriminate meaningful tables from decorative one using a composite kernel for handling structural information of tables. In this paper, structural information of a table is extracted with two types of parse trees: context tree and table tree. A context tree contains structural information around a table, while a table tree presents structural information within a table. A composite kernel is proposed to efficiently handle these two types of trees based on a parse tree kernel. The support vector machines with the proposed kernel dised kuish meaningful tables from the decorative ones with rich structural information.
Keywords
Table discrimination; Parse tree kernel; Support vector machines; composite kernel; classification;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Y. Liu, K. Bai, P. Mitra, and C. Giles, "Automatic Searching of Tables in Digital Libraries," In Proceedings of the 16th International Conference on World Wide Web, pp. 1135-1136, 2007.
2 E. Crestan and P. Pantel, "Web-scale Knowledge Extraction from Semi-structured Tables," In Proceedings of the 19th International Conference on World Wide Web, pp 1081-1082, 2010.
3 N. Cristianini and J. Shawe-Taylor, " An Introduction to Support Vector Machines and other Kernel-based Learning Methods," Cambridge University Press, 2000.
4 D. Haussler, "Convolution Kernels on Discrete Structures," Technical report, UCS-CRL-99-10, UC Santa Cruz, 1999.
5 M. Collins and N. Duffy, "Convolution Kernels for Natural Language," In Advances in Neural Information Processing Systems 14, pp. 625-632, 2001
6 M. Hurst, "Layout and language: Challenges for table understanding on the web," In Proceedings of WDA'01, pp. 27-30, 2001.
7 S. Jung, K. Sung, T. Park, and H. Kwon, "Effective Retrieval of Information in Tables on the Internet," In Proceedings of IEA/AIE'02, pp. 493-501, 2002.
8 G. Penn, J. Hu, H. Luo, and R. McDonald, "Flexible Web Document Analysis for Delivery to Narrow-bandwidth Devices," In Proceedings of ICDAR'06, pp. 119-130, 2004.
9 Y. Zhai and B. Liu, "Web Data Extraction based on Partial Tree Alignment," In Proceedings of the WWW'05, pp. 76-85, 2005.
10 H. Chen, S. Tsai, and J. Tsai, "Mining Tables from Large Scale HTML texts," In Proceedings of the 18th International Conference Computational Linguistics, pp. 166-182, 2007.
11 Y. Wang and J. Hu, "A Machine Learning based Approach for Table Detection on the Web," In Proceedings of WWW'02, pp. 242-250, 2002.
12 S. Jung and H. Kwon, "A Scalable Hybrid Approach for Extracting Head Components from Web Tables", IEEE Transactions on Knowledge and Data Engineering, Vol. 18, No. 2, pp. 174-187, 2006.   DOI
13 E. Crestan and P. Pantel, "A Fine-Grained Taxonomy of Tables on the Web," In Proceedings of the 19th ACM International Conference on Information and Knowledge management, pp. 1405-1408, 2010.