DOI QR코드

DOI QR Code

Relational Data Extraction and Transformation: A Study to Enhance Information Systems Performance

  • Received : 2021.10.16
  • Accepted : 2022.11.28
  • Published : 2022.12.31

Abstract

The most effective method to improve information system capabilities is to enable instant access to several relational database sources and transform data with a logical structure into multiple target relational databases. There are numerous data transformation tools available; however, they typically contain fixed procedures that cannot be changed by the user, making it impossible to fulfill the near-real-time data transformation requirements. Furthermore, some tools cannot build object references or alter attribute constraints. There are various situations in which tool changes in data type cause conflicts and difficulties with data quality while transforming between the two systems. The R-programming language was extensively used throughout this study, and several different relational database structures were utilized to complete the proposed study. Experiments showed that the developed study can improve the performance of information systems by interacting with and exchanging data with various relational databases. The study addresses data quality issues, particularly the completeness and integrity dimensions of the data transformation processes.

Keywords

References

  1. S. Ristic, S. Aleksic, M. Celikovic, V. Dimitrieski, and I. Lukovic, "Database reverse engineering based on meta-models," Open Computer Science, vol. 4, no. 3, pp. 150-159, Oct. 2014. DOI: 10.2478/s13537-014-0218-1.
  2. Y. Cheng, P. Ding, T. Wang, W. Lu, and X. Du, "Which category is better: Benchmarking relational and graph database management systems," Data Science and Engineering, vol. 4, no. 4, pp. 309-322, Nov. 2019. DOI: 10.1007/s41019-019-00110-3.
  3. W. Lu, J. Hou, Y. Yan, M. Zhang, X. Du, and T. Moscibroda, "MSQL: Efficient similarity search in metric spaces using SQL," The VLDB Journal, vol. 26, no. 6, pp. 829-854, Dec. 2017. DOI: 10.1007/s00778-017-0481-6.
  4. H. Won, M. C. Nguyen, M. -S. Gil, Y. -S. Moon, and K. -Y. Whang, "Moving metadata from ad hoc files to database tables for robust, highly available, and scalable HDFS," The Journal of Supercomputing, vol. 73, no. 6, pp. 2657-2681, Mar. 2017. DOI: 10.1007/s11227-016-1949-7.
  5. A. Prabhune, R. Stotzka, V. Sakharkar, J. Hesser, and M. Gertz, "MetaStore: An adaptive metadata management framework for heterogeneous metadata models," Distributed and Parallel Databases, vol. 36, no. 1, pp. 153-194, Oct. 2017. DOI: 10.1007/s10619-017-7210-4.
  6. J. Oh, W. H. Ahn, and T. Kim, "Automatic extraction of dependencies between web components and database resources in java web applications," Journal of Information and Communication Convergence Engineering, vol. 17, no. 2, pp. 149-160, Jun. 2019. DOI: 10.6109/jicce.2019.17.2.149.
  7. B. Walek and C. Klimes, "A methodology for data migration between different database management systems," International Journal of Computer and Information Engineering, vol. 6, no. 5, pp. 536-541, May. 2012. DOI: 10.5281/zenodo.1330271.
  8. P. Martins, F. Sa, C. Wanzeller, and M. Abbasi, "A performance study on different data load methods in relational databases," in 2019 14th Iberian Conference on Information Systems and Technologies (CISTI), Coimbra, Portugal, pp. 1-7, 2019. DOI: 10.23919/CISTI.2019.8760615.
  9. P. Atzeni, L. Bellomarini, and F. Bugiotti, "EXLEngine: Executable schema mappings for statistical data processing," in Proceedings of the 16th International Conference on Extending Database Technology, Genoa, Italy, pp. 672-682, 2013. DOI: 10.1145/2452376.2452455.
  10. S. -C. Haw, E. Soong, N. A. Amirah, and A. Amin, "XMapDB-Sim: Performance evaluation on model-based XML to relational database mapping choices," Indonesian Journal of Electrical Engineering and Computer Science, vol. 7, no. 2, pp. 551-566, Aug. 2017. DOI: 10.11591/ijeecs.v7.i2.pp551-566.
  11. G. V. Machado, I. Cunha, A. C. Pereira, and L. B. Oliveira, "DODETL: Distributed on-demand ETL for near real-time business intelligence," Journal of Internet Services and Applications, vol. 10, no. 1, pp. 1-15, Nov. 2019. DOI: 10.1186/s13174-019-0121-z.
  12. A. Nabli, S. Bouaziz, R. Yangui, and F. Gargouri, "Two-ETL phases for data warehouse creation: Design and implementation," in East European Conference on Advances in Databases and Information Systems, Poitiers, France, pp. 138-150, 2015. DOI: 10.1007/978-3-319-23135-8_10.
  13. P. Kathiravelu, A. Sharma, H. Galhardas, P. V. Roy, and L. Veiga, "On-demand big data integration," Distributed and Parallel Databases, vol. 37, no. 2, pp. 273-295, Sep. 2019. DOI: 10.1007/s10619-018-7248-y.
  14. G. W. Sasmito, D. S. Wibowo, and D. Dairoh, "Implementation of rapid application development method in the development of geographic information systems of industrial centers," Journal of Information and Communication Convergence Engineering, vol. 18, no. 3, pp. 194-200, Sep. 2020. DOI: 10.6109/jicce.2020.18.3.194.
  15. W. C. Alisawi, A. A. A. Hussain, and W. A. Alawsi, "Estimate model of system management for database security," Indonesian Journal of Electrical Engineering and Computer Science, vol. 14, no. 3, pp. 1391-1394, Jun. 2019. DOI: 10.11591/ijeecs.v14.i3.pp1391-1394.
  16. N. A. Emran, "Data completeness measures," in Proceedings of Pattern Analysis, Intelligent Security and the Internet of Things, Malacca, Malaysia, pp. 117-130, 2015. DOI: 10.1007/978-3-319-17398-6_11.
  17. J. Ji and Y. Chung, "k-NN join based on LSH in big data environment," Journal of Information and Communication Convergence Engineering, vol. 16, no. 2, pp. 99-105, Jun. 2018. DOI: 10.6109/jicce.2018.16.2.99.
  18. V. Theodorou, A. Abello, W. Lehner, and M. Thiele, "Quality measures for ETL processes," in International Conference on Data Warehousing and Knowledge Discovery, Munich, Gemany, pp. 9-22, 2014. DOI: 10.1007/978-3-319-10160-6_2.
  19. D. P. Ballou and H. L. Pazer, "Modeling completeness versus consistency tradeoffs in information decision contexts," IEEE Transactions on Knowledge and Data Engineering, vol. 15, no. 1, pp. 240-243, Jan.-Feb. 2003. DOI: 10.1109/TKDE.2003.1161595.
  20. N. M. Muddasir and K. Raghuveer, "Study of methods to achieve near real time ETL," in 2017 International Conference on Current Trends in Computer, Electrical, Electronics, and Communication (CTCEEC), Mysore, India, pp. 436-441, 2017. DOI: 10.1109/CTCEEC.2017.8455002.
  21. A. Prema and A. Pethalakshmi, "Novel approach in ETL," in 2013 International Conference on Pattern Recognition, Informatics and Mobile Engineering, Salem, India, pp. 429-434, 2013. DOI: 10.1109/ICPRIME.2013.6496515.
  22. P. Tiwari, "Advanced ETL (AETL) by integration of PERL and scripting method," in 2016 International Conference on Inventive Computation Technologies (ICICT), Coimbatore, India, vol. 3, pp. 1-5, 2016. DOI: 10.1109/INVENTIVE.2016.7830102.
  23. M. Radonic and I. Mekterovic, "ETLator-a scripting ETL framework," in 2017 40th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), Opatija, Croatia, pp. 1349-1354, 2017. DOI: 10.23919/MIPRO.2017.7973632.
  24. N. E. Moukhi, I. El Azami, and A. Mouloudi, "X-ETL: A new method for designing multidimensional models," in 2017 3rd International Conference of Cloud Computing Technologies and Applications (CloudTech), Rabat, Morocco, pp. 1-6, 2017. DOI: 10.1109/CloudTech.2017.8284704.
  25. B. Pan, G. Zhang, and X. Qin, "Design and realization of an ETL method in business intelligence project," in 2018 IEEE 3rd International Conference on Cloud Computing and Big Data Analysis (ICCCBDA), Chengdu, China, pp. 275-279, 2018. DOI: 10.1109/ICCCBDA.2018.8386526.
  26. M. A. Maatuk, A. Ali, and N. Rossiter, "Semantic enrichment: The first phase of relational database migration," in Innovations and Advances in Computer Sciences and Engineering, pp. 373-378, Dec. 2009. DOI: 10.1007/978-90-481-3658-2_65.
  27. L. Stanescu, M. Brezovan, and D. D. Burdescu, "Automatic mapping of MySQL databases to NoSQL MongoDB," in 2016 Federated Conference on Computer Science and Information Systems (FedCSIS), Gdansk, Poland, pp. 837-840, 2016.
  28. A. Ciobanu, hr-schema-mysql. 2021. Accessed: Oct. 15, 2021. [Online]. Available: https://github.com/nomemory/hr-schema-mysql/blob/0c3c8f322e607c5249de8adb8e43c0c08351d47c/hr-schema-mysql.sql.
  29. S. H. Adi, "Introduction to spatial and tabular data analysis with R," Cover Dalam, pp. 42, Nov. 2019.