DOI QR코드

DOI QR Code

Analysis of Impact Between Data Analysis Performance and Database

  • Kyoungju Min (Department of Sino-Korean Literature, Chungnam National University) ;
  • Jeongyun Cho (Department of Sino-Korean Literature, Chungnam National University) ;
  • Manho Jung (Department of Sino-Korean Literature, Chungnam National University) ;
  • Hyangbae Lee (Department of Sino-Korean Literature, Chungnam National University)
  • Received : 2023.02.24
  • Accepted : 2023.06.26
  • Published : 2023.09.30

Abstract

Engineering or humanities data are stored in databases and are often used for search services. While the latest deep-learning technologies, such like BART and BERT, are utilized for data analysis, humanities data still rely on traditional databases. Representative analysis methods include n-gram and lexical statistical extraction. However, when using a database, performance limitation is often imposed on the result calculations. This study presents an experimental process using MariaDB on a PC, which is easily accessible in a laboratory, to analyze the impact of the database on data analysis performance. The findings highlight the fact that the database becomes a bottleneck when analyzing large-scale text data, particularly over hundreds of thousands of records. To address this issue, a method was proposed to provide real-time humanities data analysis web services by leveraging the open source database, with a focus on the Seungjeongwon-Ilgy, one of the largest datasets in the humanities fields.

Keywords

References

  1. R. Ma, "Boundaries, extensions, and challenges of visualization for humanities data: Reflections on three cases," in IEEE 7th Workshop on Visualization for the Digital Humanities, Oklahoma City, USA, pp. 1-5, 2022. DOI: 10.1109/VIS4DH57440.2022.00006.
  2. Korean Classics DB, Jan. 2023, [Internet] Available: https://db.itkc.or.kr/.
  3. Kyujanggak Text Search System, Jan. 2023, [Internet] Available: https://kyudb.snu.ac.kr.
  4. The Daily Records of Royal Secretariat of Joseon Dynasty, Jan. 2023, [Internet] Available: https://sjw.history.go.kr/.
  5. The Veritable Records of the Joseon Dynasty, Feb. 2023, [Internet] Available: https://sillok.history.go.kr/.
  6. S. Kessler and C. Rothen, "Pro-amateur information space: www. bildungsgeschichte.ch," Digital Turn und Historische Bildungsforschung. pp. 113-125, 2022. DOI: 10.35468/5952-08.
  7. Collection of Korean Classics Literatures, Feb. 2023, [Internet] Available: https://db.itkc.or.kr/dir/item?itemId=MO#/dir/list?itemId=MO.
  8. Ilseong-rok Original Text and Image Search System, Feb. 2023, [Internet] Available: https://kyudb.snu.ac.kr/series/main.do?item_cd=ILS.
  9. Google Books Ngram Viewer, Feb. 2023, [Internet] Available: https://books.google.com/ngrams/.
  10. M. K. Min, "Experiments of search query performance for SQL-based open source databases," International Journal of Internet, Broadcasting and Communication, vol. 10, no. 2, pp. 31-38, May 2018. DOI:10.7236/IJIBC.2018.10.2.6.
  11. R. Ceresnak and M. Kvet, "Comparison of query performance in relational a non-relation databases," in 13th International Scientific Conference on Sustainable (TRANSCOM 2019), Novy Smokovec, Slovak Republic, pp. 170-177, 2019. DOI: 10.1016/j.trpro.2019.07.027.
  12. M. K. Park, "A Study on Application using and performance comparison of in-memory database," 2016 Master's thesis, Soongsil University, 2016.
  13. F. Yang, K. Dou, S. Chen, M. hou, J. U Kang, and S. Y. Cho, "Optimizing NoSQL DB on flash: A case study of RocksDB," in 2015 IEEE 12th International Conference on Ubiquitous Intelligence and Computing (UIC-ATC-ScalCom), Beijing, China, pp. 1062-1069, 2015. DOI: 10.1109/UIC-ATC-ScalCom-CBDCom-IoP.2015.197.
  14. R. Kaushik, P. Bohannon, J. F. Naughton, and, H. F. Korth, "Covering indexes for branching path queries," in Proceedings of the 2002 ACM SIGMOD international conference on Management of data, Madison, USA, pp. 133-144, 2002. DOI: 10.1145/564691.564707.
  15. K. J. Min and B. C. Lee, "The analysis of Chosun dynasty poetry using 3D data visualization," Journal of the Korea Institute of Information and Communication Engineering, vol. 25, no. 7, pp. 861-868, Jul. 2021. DOI:10.6109/jkiice.2021.25.7.861.
  16. K. J. Min, B. C. Jin, and M. H. Jung, "Massive graph expression and shortest path search in interpersonal relationship network," Journal of the Korea Institute of Information and Communication Engineering, vol. 26, no. 4, pp. 624-632, Apr. 2022. DOI: 10.6109/jkiice.2022.26.4.624.
  17. K. J. Min, J. Y. Cho, M. H. Jung, and H. B. Lee, "Optimization for large-scale n-ary family tree visualization," Journal of information and communication convergence engineering, vol. 21, no. 1, pp. 54-61, Mar. 2023. DOI: 10.56977/jicce.2023.21.1.54.
  18. J. V. D. Donckt, J. V. D. Donckt, E. Deprost, and S. V. Hoecke, "Plotly-resampler: Effective visual analytics for large time series," in 2022 IEEE Visualization and Visual Analytics, Oklahoma City, USA, pp. 21-25, 2022. DOI:10.6109/jkiice.2022.26.4.624.
  19. M. Aminazadeh and F. Noorbehbahani, "City intersection clustering and analysis based on traffic time series," in 12th International Conference on Computer and Knowledge Engineering, Mashhad, Iran, pp. 274-281, 2022. DOI: 10.1109/ICCKE57176.2022.9960065.