Browse > Article
http://dx.doi.org/10.14400/JDC.2021.19.9.209

Database metadata standardization processing model using web dictionary crawling  

Jeong, Hana (Department of Computer Engineering, Kongju National University)
Park, Koo-Rack (Department of Computer Engineering, Kongju National University)
Chung, Young-suk (Department of Computer Engineering, Kongju National University)
Publication Information
Journal of Digital Convergence / v.19, no.9, 2021 , pp. 209-215 More about this Journal
Abstract
Data quality management is an important issue these days. Improve data quality by providing consistent metadata. This study presents algorithms that facilitate standard word dictionary management for consistent metadata management. Algorithms are presented to automate synonyms management of database metadata through web dictionary crawling. It also improves the accuracy of the data by resolving homonym distinction issues that may arise during the web dictionary crawling process. The algorithm proposed in this study increases the reliability of metadata data quality compared to the existing passive management. It can also reduce the time spent on registering and managing synonym data. Further research on the new data standardization partial automation model will need to be continued, with a detailed understanding of some of the automatable tasks in future data standardization activities.
Keywords
Data standardization; Data quality management; Web crawler; Database; Metadata;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Janssen, M., Charalabidis, Y., & Zuiderwijk, A. (2012). Benefits, adoption barriers and myths of open data and open government. Information systems management, 29(4), 258-268. DOI : 10.1080/10580530.2012.716740   DOI
2 Pitt, M. A., & Tang, Y. (2013). What should be the data sharing policy of cognitive science?. Topics in Cognitive Science, 5(1), 214-221. DOI : 10.1111/tops.12006   DOI
3 Birney, E., Hudson, T. J., Green, E. D., Gunter, C., Eddy, S., Rogers, J., ... & Yu, J. (2009). Prepublication data sharing. Nature, 461(7261), 168-170. DOI : 10.1038/461168a   DOI
4 Saha, B., & Srivastava, D. (2014, March). Data quality: The other face of big data. In 2014 IEEE 30th international conference on data engineering (pp. 1294-1297). IEEE. DOI : 10.1109/ICDE.2014.6816764
5 Kim, W., & Choi, B. (2003). Towards Quantifying Data Quality Costs. J. Object Technol., 2(4), 69-76.   DOI
6 Eppler, M., & Helfert, M. (2004, November). A classification and analysis of data quality costs. In International Conference on Information Quality (pp. 311-325).
7 Wang, R. Y., Storey, V. C., & Firth, C. P. (1995). A framework for analysis of data quality research. IEEE transactions on knowledge and data engineering, 7(4), 623-640. DOI : 10.1109/69.404034   DOI
8 Shrivastava, V. (2018). A methodical study of web crawler. Vandana Shrivastava Journal of Engineering Research and Application, 8(11), 01-08. DOI : 10.9790/9622-0811010108   DOI
9 Jamali, M., Sayyadi, H., Hariri, B. B., & Abolhassani, H. (2006, December). A method for focused crawling using combination of link structure and content similarity. In 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06) (pp. 753-756). IEEE. DOI : 10.1109/WI.2006.19
10 Lawrence, R., & Barker, K. (2001, March). Integrating relational database schemas using a standardized dictionary. In Proceedings of the 2001 ACM symposium on Applied computing (pp. 225-230). DOI : 10.1145/372202.372327
11 Dhenakaran, S. S., & Sambanthan, K. T. (2011). Web crawler-an overview. International Journal of Computer Science and Communication, 2(1), 265-267.
12 Pal, A., Tomar, D. S., & Shrivastava, S. C. (2009). Effective focused crawling based on content and link structure analysis. arXiv preprint arXiv:0906.5034.
13 Haug, A., Zachariassen, F., & Van Liempd, D. (2011). The costs of poor data quality. Journal of Industrial Engineering and Management (JIEM), 4(2), 168-193. DOI : 10.3926/jiem.2011.v4n2.p168-193   DOI
14 You, F., Gong, H., Guan, X., Cao, Y., Zhang, C., Lai, S., & Zhao, Y. (2018, August). Design of data mining of WeChat public platform based on Python. In Journal of Physics: Conference Series, 1069(1), p. 012017. IOP Publishing. DOI : 10.1088/1742-6596/1069/1/012017   DOI
15 SEnglish, L. P. (1999). Improving data warehouse and business information quality: methods for reducing costs and increasing profits. John Wiley & Sons, Inc.