Browse > Article

Similarity checking between XML tags through expanding synonym vector  

Lee, Jung-Won (Dept.of Computer, Ewah Wonan's University)
Lee, Hye-Soo (Dept.of Computer, Ewah Wonan's University)
Lee, Ki-Ho (Dept.of Computer, Ewah Wonan's University)
Abstract
The success of XML(eXtensible Markup Language) is primarily based on its flexibility : everybody can define the structure of XML documents that represent information in the form he or she desires. XML is so flexible that XML documents cannot be automatically provided with an underlying semantics. Different tag sets, different names for elements or attributes, or different document structures in general mislead the task of classifying and clustering XML documents precisely. In this paper, we design and implement a system that allows checking the semantic-based similarity between XML tags. First, this system extracts the underlying semantics of tags and then expands the synonym set of tags using an WordNet thesaurus and user-defined word library which supports the abbreviation forms and compound words for XML tags. Seconds, considering the relative importance of XML tags in the XML documents, we extend a conventional vector space model which is the most generally used for document model in Information Retrieval field. Using this method, we have been able to check the similarity between XML tags which are represented different tags.
Keywords
XML; XML; Information Retrieval; Document Processing; Document Analysis;
Citations & Related Records
연도 인용수 순위
  • Reference
1 T. Bray,J. Paoli, and C. M. Sperberg-McQueen. Extensible Markup Language (XML) 1.0, W3C Recommendation, World Wide Web Consortium, Feb. 1998 http://www.w3.org/TR/1998/REC-xml-19880210
2 William B. Frakes and Ricardo Baeza-Yates, Information Retrieval: Data Structures & Algorithms, London: Prentice Hall, 1995
3 황도삼, 최기선, 김태석 공역, 자연언어 처리, 홍릉과학출판사, 1998
4 Fellbaum, C. 1998. Wordnet: An Electronic Lexical Database, Cambridge:MIT Press
5 Minnos N. Garofalakis, Sridhar Ramaswamy, Rajeev Rastogi, and Kyuseok Shim, 'Of Crawlers, Portals, Mice, and Men : Is there more to Mining the Web?,' In Proc. of the ACM SIGMOD Int. Conf. Management of Data, pages 504, Philadelphia, PA, USA, 1999   DOI
6 R.Richardson, A.F.Smeaton, and J.Murphy, 'Using WordNet as a Knowledge Base for Measuring Semantic similarity between Words,' Working Paper:CA-1294
7 David Megginson, Structuring XML Documents, Prentice Hall PTR, 1998
8 Miller G.A., Beckwith R., Fellbaum C., Gross D. and Miller K., 'Introduction to WordNet : An On-Line Lexical Database.' in Five Papers on WordNet, CSL report, Cognitive Science Laboratory, Princeton University, 1993
9 Norman Walsh and Leonard Muellner, DocBook : The Definitive Guide, O'REILLY, 1999
10 M.Porter. An Algorithm for suffix stripping. Program, 14(3), pages 130-137, 1980   DOI
11 William J. Pardi, XML in Action, Microsoft Press, 1999
12 Gerard Salton and Michael J. McGill, Introduction to Modern Information Retrieval, McGraw-Hill Book Company, New York, 1983