Browse > Article
http://dx.doi.org/10.9708/jksci.2019.24.03.105

A Table Integration Technique Using Query Similarity Analysis  

Choi, Go-Bong (Core Information Technology)
Woo, Yong-Tae (Dept. of Computer Engineering, Changwon National University)
Abstract
In this paper, we propose a technique to analyze similarity between SQL queries and to assist integrating similar tables. First, the table information was extracted from the SQL queries through the query structure analyzer, and the similarity between the tables was measured using the Jacquard index technique. Then, similar table clusters are generated through hierarchical cluster analysis method and the co-occurence probability of the table used in the query is calculated. The possibility of integrating similar tables is classified by using the possibility of co-occurence of similarity table and table, and classifying them into an integrable cluster, a cluster requiring expert review, and a cluster with low integration possibility. This technique analyzes the SQL query in practice and analyse the possibility of table integration independent of the existing business, so that the existing schema can be effectively reconstructed without interruption of work or additional cost.
Keywords
SQL similarity analysis; Table integration; Data Architecture; Data Modeling Schema Reconstruction;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 Jong Suk Lee, Chang Ho Lee, "Modeling on Data Performance for Very Large Database," Proceedings of the Korea Safety Management & Science, No. 1, pp. 383-391, 2012.
2 Kang Soo Seo, "The Guide for Data Architecture Professional" Korea Data Agency, pp. 214-553, 2013.
3 R. Y. Wang, V. C. Storey and C. P. Firth, “A Framework for Analysis of Data Quality Research,” Transactions on Knowledge and Data Engineering, Vol. 7, No. 4, pp. 623-640, Aug. 1995.   DOI
4 Hae Kyung Rhee, "Harmfulness of Denormalization Adopted for Database for Database Performance Enhancement," Journal of the Institute of Electronics and Information Engineers, Vol. 42, No. 3, May 2005.
5 Hye Young Seo, Seo Young Kwon, Jae Kwon Ahn, Young Jin Kim, "A Case Study on the Implementation of Master Data Management System for Global Manufacturing Company," Entrue Journal of Information Technology, Vol. 7, No. 2, pp. 91-102, Jul. 2008.   DOI
6 S. Castano, V. Antonellis, M. G. Fugini and B. Pernici, "Conceptual Schema Analysis: Techniques and Applications," ACM Transactions on Database Systems, Vol. 23, No. 3, pp. 286-333, Sep. 1998.   DOI
7 H. Kopcke and E. Rahm, "Frameworks for entity matching: A comparison," Data & Knowledge Engineering, Vol. 69, No. 2, pp. 197-210, Feb. 2010.   DOI
8 D. P. Groth, "Visual Representation of Database Queries using Structual similarity," Information Visualization, pp. 102-107, 2003.
9 Hong Girl Lee et al., "A Study on the Database Integration Methodology using XML," Journal of Korean Navigation and Port Research, Vol. 29, No. 5, pp. 883-890, Dec. 2005.   DOI
10 Sanjay Madria et al., "An XML Schema Integration and Query Mechanism System," Data & Knowledge Engineering, Vol. 65, No. 2, pp. 265-303, May 2008.
11 Yun Hee Han, "Design and Implementation of Database Cache engine based on Similarity Query Matching" Masterr's Thesis, Korea Polytechnic University, 2008.
12 P. J. Rousseeuw, "Silhouette: a graphical aid to the interpretation and validation of cluster analysis," Journal of Computational and Applied Mathematics, Vol. 20, pp. 53-65, Nov. 1987.   DOI
13 Soojung Lee, "Performance Analysis of Similarity Reflecting Jaccard Index for Solving Data Sparsity in Collaborative Filtering," The Journal of Korean Association of Computer Education, Vol. 19, No. 4, pp. 59-66, Jul. 2016.   DOI
14 L. Kaufman and P. J. Rousseeuw, "Finding Groups in Data: An Introduction to Cluster Analysis," Wiley, New York, 1990.