[KSCI] Korea Science Citation Index Service

Similarity checking between XML tags through expanding synonym vector

Lee, Jung-Won (Dept.of Computer, Ewah Wonan's University)
Lee, Hye-Soo (Dept.of Computer, Ewah Wonan's University)
Lee, Ki-Ho (Dept.of Computer, Ewah Wonan's University)

Publication Information

Journal of KIISE:Software and Applications / v.29, no.9, 2002 , pp. 676-683 More about this Journal

Abstract

The success of XML(eXtensible Markup Language) is primarily based on its flexibility : everybody can define the structure of XML documents that represent information in the form he or she desires. XML is so flexible that XML documents cannot be automatically provided with an underlying semantics. Different tag sets, different names for elements or attributes, or different document structures in general mislead the task of classifying and clustering XML documents precisely. In this paper, we design and implement a system that allows checking the semantic-based similarity between XML tags. First, this system extracts the underlying semantics of tags and then expands the synonym set of tags using an WordNet thesaurus and user-defined word library which supports the abbreviation forms and compound words for XML tags. Seconds, considering the relative importance of XML tags in the XML documents, we extend a conventional vector space model which is the most generally used for document model in Information Retrieval field. Using this method, we have been able to check the similarity between XML tags which are represented different tags.

Keywords

XML; XML; Information Retrieval; Document Processing; Document Analysis;

Citations & Related Records

Reference

1	T. Bray,J. Paoli, and C. M. Sperberg-McQueen. Extensible Markup Language (XML) 1.0, W3C Recommendation, World Wide Web Consortium, Feb. 1998 http://www.w3.org/TR/1998/REC-xml-19880210
2	William B. Frakes and Ricardo Baeza-Yates, Information Retrieval: Data Structures & Algorithms, London: Prentice Hall, 1995
3	황도삼, 최기선, 김태석 공역, 자연언어 처리, 홍릉과학출판사, 1998
4	Fellbaum, C. 1998. Wordnet: An Electronic Lexical Database, Cambridge:MIT Press
5	Minnos N. Garofalakis, Sridhar Ramaswamy, Rajeev Rastogi, and Kyuseok Shim, 'Of Crawlers, Portals, Mice, and Men : Is there more to Mining the Web?,' In Proc. of the ACM SIGMOD Int. Conf. Management of Data, pages 504, Philadelphia, PA, USA, 1999 DOI
6	R.Richardson, A.F.Smeaton, and J.Murphy, 'Using WordNet as a Knowledge Base for Measuring Semantic similarity between Words,' Working Paper:CA-1294
7	David Megginson, Structuring XML Documents, Prentice Hall PTR, 1998
8	Miller G.A., Beckwith R., Fellbaum C., Gross D. and Miller K., 'Introduction to WordNet : An On-Line Lexical Database.' in Five Papers on WordNet, CSL report, Cognitive Science Laboratory, Princeton University, 1993
9	Norman Walsh and Leonard Muellner, DocBook : The Definitive Guide, O'REILLY, 1999
10	M.Porter. An Algorithm for suffix stripping. Program, 14(3), pages 130-137, 1980 DOI
11	William J. Pardi, XML in Action, Microsoft Press, 1999
12	Gerard Salton and Michael J. McGill, Introduction to Modern Information Retrieval, McGraw-Hill Book Company, New York, 1983

KSCI

Similarity checking between XML tags through expanding synonym vector 유사어 벡터 확장을 통한 XML태그의 유사성 검사

Similarity checking between XML tags through expanding synonym vector