Browse > Article
http://dx.doi.org/10.3745/KIPSTD.2007.14-D.2.145

Improving Performance of Change Detection Algorithms through the Efficiency of Matching  

Lee, Suk-Kyoon (단국대학교 정보컴퓨터학부)
Kim, Dong-Ah (단국대학교 컴퓨터과학)
Abstract
Recently, the needs for effective real time change detection algorithms for XML/HTML documents and increased in such fields as the detection of defacement attacks to web documents, the version management, and so on. Especially, those applications of real time change detection for large number of XML/HTML documents require fast heuristic algorithms to be used in real time environment, instead of algorithms which compute minimal cost-edit scripts. Existing heuristic algorithms are fast in execution time, but do not provide satisfactory edit script. In this paper, we present existing algorithms XyDiff and X-tree Diff, analyze their problems and propose algorithm X-tree Diff which improve problems in existing ones. X-tree Diff+ has similar performance in execution time with existing algorithms, but it improves matching ratio between nodes from two documents by refining matching process based on the notion of efficiency of matching.
Keywords
Hierarchically-structured documents; XML; diff algorithm; Change Detection;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 D.T. Bamard, G. Clarke and N. Duncan, 'Tree-to-tree correction for document trees.' Technical Report, Department of Computing and Information Science Queen's University, Kingston Ontario, Canada, January 1995
2 A. Aboulnaga, J. F. Naughton, and C. Zhang,'Generating Synthetic Complex-structured XML Data.' In Proceedings of the Fourth International Workshop on the Weh and Databases, WebDB, 2001
3 'Concurrent Versions System(CVS),' Free Software Foundation, http://www.gnu.org/manual/cvs-1.9
4 Curbera and D. A. Epstein,'Fast Difference and Update of XML Documents,' XTech '99, San Jose, March 1999
5 D. A. Kim and S. K. Lee, 'Efficient Change Detection in Tree Structured Data,' In Human.Society@Internet 2003, pp.675-681, 2003
6 김동아, 이석균,'X-tree Diff : 트리 기반 데이터를 위한 효율적인 변화 탐지 알고리즘,' 정보처리학회논문지(C), 제10호 제6권, pp.683-694, 2003   과학기술학회마을   DOI
7 김동아, 'XML 문서에 대한 변화 탐지 및 관리,' 단국대학교 전산통계학과 박사학위논문, pp.1-111, 2005
8 Xyleme Project, http://www.xyleme.com/en/
9 S. M. Selkow,'The tree to-tree editing problem.' Information Processing Letters, 6, pp.184-186, 1977   DOI   ScienceOn
10 XyDiff Tools, http://pauillac.inria.fr/cdrom/www/xycliff/index-eng.htm
11 R. Rivest,'The MD4 Message Digest Algorithm,' MIT and RSA Data Security, Inc., April 1992
12 Y. Wang, D. Dewitt and J. Cai,'X-Diff: An effective change detection algorithm for XML Documents,' in 19th ICDE, India, March 2003   DOI
13 K. Zhang and D. Shasha,'Simple fast algorithms for the editing distance between trees and related problems,' SIAM Journal of Computing, 18(6), pp.1245-1262, 1979   DOI
14 NIAGARA Query Engine, http://www.cs.wisc.edu/niagaral
15 R. Wagner and M. Fischer,'The string-to string connection problem,' Journal of the ACM, 21, pp.168-173, 19   DOI   ScienceOn
16 G. Cobcna, S. Abiteboul and A. Marian,'Detecting Changes in XML Documents,' The 18th ICDE, 2002   DOI
17 S. Chawathe and H. G. Molina,'Meaningful Change Detection in Structured Data,' In SIGMOD '97, pp.26-37, 1997   DOI
18 S. Lu,'A tree-to- tree distance and its application to cluster analysis,' IEEE TPAMI. 1(2), pp.219-224, 1979
19 E. W. Myers,'An O(ND) Difference Algorithm and Its Variations,' Algorithmica, 1(2), pp.251-266, 1986   DOI
20 K. Tai,'The tree to tree correction problem,' Journal of the ACM, 26(3), pp.422-433 , July 1979   DOI   ScienceOn