DOI QR코드

DOI QR Code

Schema- and Data-driven Discovery of SQL Keys

  • Le, Van Bao Tran (School of Information Management, The Victoria University of Wellington) ;
  • Sebastian, Link (Department of Computer Science, The University of Auckland) ;
  • Mozhgan, Memari (Department of Computer Science, The University of Auckland)
  • 투고 : 2012.06.13
  • 심사 : 2012.08.20
  • 발행 : 2012.09.30

초록

Keys play a fundamental role in all data models. They allow database systems to uniquely identify data items, and therefore, promote efficient data processing in many applications. Due to this, support is required to discover keys. These include keys that are semantically meaningful for the application domain, or are satisfied by a given database. We study the discovery of keys from SQL tables. We investigate the structural and computational properties of Armstrong tables for sets of SQL keys. Inspections of Armstrong tables enable data engineers to consolidate their understanding of semantically meaningful keys, and to communicate this understanding to other stake-holders. The stake-holders may want to make changes to the tables or provide entirely different tables to communicate their views to the data engineers. For such a purpose, we propose data mining algorithms that discover keys from a given SQL table. We combine the key mining algorithms with Armstrong table computations to generate informative Armstrong tables, that is, key-preserving semantic samples of existing SQL tables. Finally, we define formal measures to assess the distance between sets of SQL keys. The measures can be applied to validate the usefulness of Armstrong tables, and to automate the marking and feedback of non-multiple choice questions in database courses.

키워드

참고문헌

  1. S. Hartmann, U. Leck, and S. Link, 'On Codd families of keys over incomplete relations," The Computer Journal, vol. 54, no. 7, pp. 1166-1180, 2011. https://doi.org/10.1093/comjnl/bxq073
  2. B. Thalheim, "On semantic issues connected with keys in relational databases permitting null values," Journal of Information Processing and Cybernetics, vol. 25, no. 1-2, pp. 11- 20, 1989.
  3. B. Thalheim, "The number of keys in relational and nested relational databases," Discrete Applied Mathematics, vol. 40, no. 2, pp. 265-282, 1992. https://doi.org/10.1016/0166-218X(92)90033-7
  4. G. E. Weddell, "Reasoning about functional dependencies generalized for semantic data models," ACM Transactions on Database Systems, vol. 17, no. 1, pp. 32-64, 1992. https://doi.org/10.1145/128765.128767
  5. V. L. Khizder and G. E. Weddell, "Reasoning about uniqueness constraints in object relational databases," IEEE Transactions on Knowledge and Data Engineering, vol. 15, no. 5, pp. 1295-1306, 2003. https://doi.org/10.1109/TKDE.2003.1232279
  6. D. Toman and G. E. Weddell, "On keys and functional dependencies as first-class citizens in description logics," Journal of Automated Reasoning, vol. 40, no. 2-3, pp. 117-132, 2008. https://doi.org/10.1007/s10817-007-9092-z
  7. P. Buneman, S. Davidson, W. Fan, C. Hara, and W. C. Tan, "Keys for XML," Computer Networks, vol. 39, no. 5, pp. 473-487, 2002. https://doi.org/10.1016/S1389-1286(02)00223-2
  8. S. Hartmann and S. Link, "Efficient reasoning about a robust XML key fragment," ACM Transactions on Database Systems, vol. 34, no. 2, article no. 10, 2009.
  9. S. Hartmann and S. Link, "Numerical constraints on XML data," Information and Computation, vol. 208, no. 5, pp. 521-544, 2010. https://doi.org/10.1016/j.ic.2008.09.004
  10. G. Lausen, "Relational databases in RDF: keys and foreign keys," Semantic Web, Ontologies and Databases, Lecture Notes in Computer Science vol. 5005, V. Christophides et al. editors, Heidelberg: Springer, pp. 43-56, 2008.
  11. B. C. Grau, I. Horrocks, B. Motik, B. Parsia, P. Patel- Schneider, and U. Sattler, "OWL 2: the next step for OWL," Web Semantics, vol. 6, no. 4, pp. 309-322, 2008. https://doi.org/10.1016/j.websem.2008.05.001
  12. Y. Sismanis, P. Brown, P. J. Haas, and B. Reinwald, "GORDIAN: efficient and scalable discovery of composite keys," Proceedings of the 32nd International Conference on Very Large Data Bases, Seoul, Korea, 2006, pp. 691-702.
  13. J. Demetrovics, "On the number of candidate keys," Information Processing Letters, vol. 7, no. 6, pp. 266-269, 1978. https://doi.org/10.1016/0020-0190(78)90013-3
  14. CA Technologies, CA ERwin data modeler: methods guide r7.3, https://support.ca.com/cadocs/0/e002961e.pdf.
  15. S. Abiteboul, R. Hull, and V. Vianu, Foundation of Database, Reading, MA: Addison-Wesley, 1995.
  16. R. Fagin, "Armstrong databases," IBM Research Laboratory, San Jose, CA, Research report RJ3440-40926, 1982.
  17. B. Thalheim, Dependencies in Relational Databases, Stuttgart: B. G. Teubner, 1991.
  18. W. D. Langeveldt and S. Link, "Empirical evidence for the usefulness of Armstrong relations in the acquisition of meaningful functional dependencies," Information Systems, vol. 35, no. 3, pp. 352-374, 2010. https://doi.org/10.1016/j.is.2009.11.002
  19. S. Link, "Armstrong databases: validation, communication and consolidation of conceptual models with perfect test data," Proceedings of the 9th Asia-Pacific Conference on Conceptual Modelling, Melbourne, Australia, 2012, pp. 3-19.
  20. B. Alexe, B. T. Cate, P. G. Kolaitis, and W. C. Tan, "Characterizing schema mappings via data examples," ACM Transactions on Database Systems, vol. 36, no. 4, article no. 23, 2011.
  21. H. Mannila and K. J. Raiha, "Design by example: an application of Armstrong relations," Journal of Computer and System Sciences, vol. 33, no. 2, pp. 126-141, 1986. https://doi.org/10.1016/0022-0000(86)90015-2
  22. F. de Marchi and J. M. Petit, "Semantic sampling of existing databases through informative Armstrong databases," Information Systems, vol. 32, no. 3, pp. 446-457, 2007. https://doi.org/10.1016/j.is.2005.12.007
  23. J. Demetrovics, "On the equivalence of candidate keys with Sperner systems," Acta Cybernetica, vol. 4, no. 3, pp. 247- 252, 1979.
  24. C. Beeri, M. Dowd, R. Fagin, and R. Statman, "On the structure of Armstrong relations for functional dependencies," Journal of the ACM, vol. 31, no. 1, pp. 30-46, 1984. https://doi.org/10.1145/2422.322414
  25. Y. Huhtala, J. Karkkainen, P. Porkka, and H. Toivonen, "TANE: an efficient algorithm for discovering functional and approximate dependencies," The Computer Journal, vol. 42, no. 2, pp. 100-111, 1999. https://doi.org/10.1093/comjnl/42.2.100
  26. H. Mannila and K. J. Raiha, "Algorithms for inferring functional dependencies from relations," Data and Knowledge Engineering, vol. 12, no. 1, pp. 83-99, 1994. https://doi.org/10.1016/0169-023X(94)90023-X
  27. E. F. Codd, "A relational model of data for large shared data banks," Communications of the ACM, vol. 13, no. 6, pp. 377-387, 1970. https://doi.org/10.1145/362384.362685
  28. T. Imielinski and W. Lipski Jr, "Incomplete information in relational databases," Journal of the ACM, vol. 31, no. 4, pp. 761-791, 1984. https://doi.org/10.1145/1634.1886
  29. C. Zaniolo, "Database relations with null values," Journal of Computer and System Sciences, vol. 28, no. 1, pp. 142-166, 1984. https://doi.org/10.1016/0022-0000(84)90080-1
  30. P. Atzeni and N. M. Morfuni, "Functional dependencies and constraints on null values in database relations," Information and Control, vol. 70, no. 1, pp. 1-31, 1986. https://doi.org/10.1016/S0019-9958(86)80022-5
  31. S. Hartmann, M. Kirchberg, and S. Link, 'Design by example for SQL table definitions with functional dependencies," The VLDB Journal, vol. 21, no. 1, pp. 121-144, 2012. https://doi.org/10.1007/s00778-011-0239-5
  32. V. B. T. Le, S. Link, and M. Memari, "Discovery of keys from SQL tables," Database Systems for Advanced Applications, Lecture Notes in Computer Science vol. 7238, S. Lee et al. editors, Heidelberg: Springer Berlin, pp. 48-62, 2012.
  33. C. J. Date, Database Design and Relational Theory: Normal Forms and All That Jazz, Sebastopol, CA: O'Reilly Media, 2012.
  34. T. Eiter and G. Gottlob, "Identifying the minimal transversals of a hypergraph and related problems," SIAM Journal on Computing, vol. 24, no. 6, pp. 1278-1304, 1995. https://doi.org/10.1137/S0097539793250299
  35. J. Demetrovics and V. D. Thi, "Keys, antikeys and prime attributes," Annales Universitatis Scientiarum Budapestinensis de Rolando Eotvos Nominatae Sectio Computatorica, vol. 8, pp. 35-52, 1987.
  36. S. Hartmann and S. Link, "When data dependencies over SQL tables meet the logics of paradox and S-3," Proceedings of the 29th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, Indianapolis, IN, 2010, pp. 317-326.
  37. S. Hartmann and S. Link, "The implication problem of data dependencies over SQL table definitions: axiomatic, algorithmic and logical characterizations," ACM Transactions on Database Systems, vol. 37, no. 2, article no. 13, 2012.
  38. M. Levene and G. Loizou, "Axiomatisation of functional dependencies in incomplete relations," Theoretical Computer Science, vol. 206, no. 1-2, pp. 283-300, 1998. https://doi.org/10.1016/S0304-3975(98)80029-7

피인용 문헌

  1. Cloud-based learning system for answer ranking vol.20, pp.3, 2017, https://doi.org/10.1007/s10586-017-0888-2