Browse > Article
http://dx.doi.org/10.5626/JCSE.2012.6.3.193

Schema- and Data-driven Discovery of SQL Keys  

Le, Van Bao Tran (School of Information Management, The Victoria University of Wellington)
Sebastian, Link (Department of Computer Science, The University of Auckland)
Mozhgan, Memari (Department of Computer Science, The University of Auckland)
Publication Information
Journal of Computing Science and Engineering / v.6, no.3, 2012 , pp. 193-206 More about this Journal
Abstract
Keys play a fundamental role in all data models. They allow database systems to uniquely identify data items, and therefore, promote efficient data processing in many applications. Due to this, support is required to discover keys. These include keys that are semantically meaningful for the application domain, or are satisfied by a given database. We study the discovery of keys from SQL tables. We investigate the structural and computational properties of Armstrong tables for sets of SQL keys. Inspections of Armstrong tables enable data engineers to consolidate their understanding of semantically meaningful keys, and to communicate this understanding to other stake-holders. The stake-holders may want to make changes to the tables or provide entirely different tables to communicate their views to the data engineers. For such a purpose, we propose data mining algorithms that discover keys from a given SQL table. We combine the key mining algorithms with Armstrong table computations to generate informative Armstrong tables, that is, key-preserving semantic samples of existing SQL tables. Finally, we define formal measures to assess the distance between sets of SQL keys. The measures can be applied to validate the usefulness of Armstrong tables, and to automate the marking and feedback of non-multiple choice questions in database courses.
Keywords
Algorithm; Complexity; Armstrong database; Key; Soundness; Completeness; Mining; SQL;
Citations & Related Records
연도 인용수 순위
  • Reference
1 M. Levene and G. Loizou, "Axiomatisation of functional dependencies in incomplete relations," Theoretical Computer Science, vol. 206, no. 1-2, pp. 283-300, 1998.   DOI   ScienceOn
2 T. Imielinski and W. Lipski Jr, "Incomplete information in relational databases," Journal of the ACM, vol. 31, no. 4, pp. 761-791, 1984.   DOI   ScienceOn
3 C. Zaniolo, "Database relations with null values," Journal of Computer and System Sciences, vol. 28, no. 1, pp. 142-166, 1984.   DOI   ScienceOn
4 P. Atzeni and N. M. Morfuni, "Functional dependencies and constraints on null values in database relations," Information and Control, vol. 70, no. 1, pp. 1-31, 1986.   DOI   ScienceOn
5 S. Hartmann, M. Kirchberg, and S. Link, 'Design by example for SQL table definitions with functional dependencies," The VLDB Journal, vol. 21, no. 1, pp. 121-144, 2012.   DOI
6 V. B. T. Le, S. Link, and M. Memari, "Discovery of keys from SQL tables," Database Systems for Advanced Applications, Lecture Notes in Computer Science vol. 7238, S. Lee et al. editors, Heidelberg: Springer Berlin, pp. 48-62, 2012.
7 C. J. Date, Database Design and Relational Theory: Normal Forms and All That Jazz, Sebastopol, CA: O'Reilly Media, 2012.
8 T. Eiter and G. Gottlob, "Identifying the minimal transversals of a hypergraph and related problems," SIAM Journal on Computing, vol. 24, no. 6, pp. 1278-1304, 1995.   DOI
9 J. Demetrovics and V. D. Thi, "Keys, antikeys and prime attributes," Annales Universitatis Scientiarum Budapestinensis de Rolando Eotvos Nominatae Sectio Computatorica, vol. 8, pp. 35-52, 1987.
10 S. Hartmann and S. Link, "When data dependencies over SQL tables meet the logics of paradox and S-3," Proceedings of the 29th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, Indianapolis, IN, 2010, pp. 317-326.
11 S. Hartmann and S. Link, "The implication problem of data dependencies over SQL table definitions: axiomatic, algorithmic and logical characterizations," ACM Transactions on Database Systems, vol. 37, no. 2, article no. 13, 2012.
12 H. Mannila and K. J. Raiha, "Design by example: an application of Armstrong relations," Journal of Computer and System Sciences, vol. 33, no. 2, pp. 126-141, 1986.   DOI   ScienceOn
13 W. D. Langeveldt and S. Link, "Empirical evidence for the usefulness of Armstrong relations in the acquisition of meaningful functional dependencies," Information Systems, vol. 35, no. 3, pp. 352-374, 2010.   DOI   ScienceOn
14 S. Link, "Armstrong databases: validation, communication and consolidation of conceptual models with perfect test data," Proceedings of the 9th Asia-Pacific Conference on Conceptual Modelling, Melbourne, Australia, 2012, pp. 3-19.
15 B. Alexe, B. T. Cate, P. G. Kolaitis, and W. C. Tan, "Characterizing schema mappings via data examples," ACM Transactions on Database Systems, vol. 36, no. 4, article no. 23, 2011.
16 F. de Marchi and J. M. Petit, "Semantic sampling of existing databases through informative Armstrong databases," Information Systems, vol. 32, no. 3, pp. 446-457, 2007.   DOI   ScienceOn
17 J. Demetrovics, "On the equivalence of candidate keys with Sperner systems," Acta Cybernetica, vol. 4, no. 3, pp. 247- 252, 1979.
18 C. Beeri, M. Dowd, R. Fagin, and R. Statman, "On the structure of Armstrong relations for functional dependencies," Journal of the ACM, vol. 31, no. 1, pp. 30-46, 1984.   DOI   ScienceOn
19 Y. Huhtala, J. Karkkainen, P. Porkka, and H. Toivonen, "TANE: an efficient algorithm for discovering functional and approximate dependencies," The Computer Journal, vol. 42, no. 2, pp. 100-111, 1999.   DOI
20 H. Mannila and K. J. Raiha, "Algorithms for inferring functional dependencies from relations," Data and Knowledge Engineering, vol. 12, no. 1, pp. 83-99, 1994.   DOI   ScienceOn
21 S. Hartmann and S. Link, "Efficient reasoning about a robust XML key fragment," ACM Transactions on Database Systems, vol. 34, no. 2, article no. 10, 2009.
22 E. F. Codd, "A relational model of data for large shared data banks," Communications of the ACM, vol. 13, no. 6, pp. 377-387, 1970.   DOI
23 D. Toman and G. E. Weddell, "On keys and functional dependencies as first-class citizens in description logics," Journal of Automated Reasoning, vol. 40, no. 2-3, pp. 117-132, 2008.   DOI
24 P. Buneman, S. Davidson, W. Fan, C. Hara, and W. C. Tan, "Keys for XML," Computer Networks, vol. 39, no. 5, pp. 473-487, 2002.   DOI   ScienceOn
25 S. Hartmann and S. Link, "Numerical constraints on XML data," Information and Computation, vol. 208, no. 5, pp. 521-544, 2010.   DOI   ScienceOn
26 G. Lausen, "Relational databases in RDF: keys and foreign keys," Semantic Web, Ontologies and Databases, Lecture Notes in Computer Science vol. 5005, V. Christophides et al. editors, Heidelberg: Springer, pp. 43-56, 2008.
27 B. C. Grau, I. Horrocks, B. Motik, B. Parsia, P. Patel- Schneider, and U. Sattler, "OWL 2: the next step for OWL," Web Semantics, vol. 6, no. 4, pp. 309-322, 2008.   DOI   ScienceOn
28 Y. Sismanis, P. Brown, P. J. Haas, and B. Reinwald, "GORDIAN: efficient and scalable discovery of composite keys," Proceedings of the 32nd International Conference on Very Large Data Bases, Seoul, Korea, 2006, pp. 691-702.
29 J. Demetrovics, "On the number of candidate keys," Information Processing Letters, vol. 7, no. 6, pp. 266-269, 1978.   DOI   ScienceOn
30 CA Technologies, CA ERwin data modeler: methods guide r7.3, https://support.ca.com/cadocs/0/e002961e.pdf.
31 S. Abiteboul, R. Hull, and V. Vianu, Foundation of Database, Reading, MA: Addison-Wesley, 1995.
32 B. Thalheim, "On semantic issues connected with keys in relational databases permitting null values," Journal of Information Processing and Cybernetics, vol. 25, no. 1-2, pp. 11- 20, 1989.
33 R. Fagin, "Armstrong databases," IBM Research Laboratory, San Jose, CA, Research report RJ3440-40926, 1982.
34 B. Thalheim, Dependencies in Relational Databases, Stuttgart: B. G. Teubner, 1991.
35 S. Hartmann, U. Leck, and S. Link, 'On Codd families of keys over incomplete relations," The Computer Journal, vol. 54, no. 7, pp. 1166-1180, 2011.   DOI   ScienceOn
36 B. Thalheim, "The number of keys in relational and nested relational databases," Discrete Applied Mathematics, vol. 40, no. 2, pp. 265-282, 1992.   DOI   ScienceOn
37 V. L. Khizder and G. E. Weddell, "Reasoning about uniqueness constraints in object relational databases," IEEE Transactions on Knowledge and Data Engineering, vol. 15, no. 5, pp. 1295-1306, 2003.   DOI   ScienceOn
38 G. E. Weddell, "Reasoning about functional dependencies generalized for semantic data models," ACM Transactions on Database Systems, vol. 17, no. 1, pp. 32-64, 1992.   DOI