Browse > Article
http://dx.doi.org/10.14404/JKSARM.2020.20.2.067

Metadata Design and Machine Learning-Based Automatic Indexing for Efficient Data Management of Image Archives of Local Governments in South Korea  

Kim, InA (충남대학교 컴퓨터융합학부)
Kang, Young-Sun ((주)레드윗)
Lee, Kyu-Chul (충남대학교 컴퓨터융합학부)
Publication Information
Journal of Korean Society of Archives and Records Management / v.20, no.2, 2020 , pp. 67-83 More about this Journal
Abstract
Many local governments in Korea provide online services for people to easily access the audio-visual archives of events occurring in the area. However, the current method of managing these archives of the local governments has several problems in terms of compatibility with other organizations and convenience for searching of the archives because of the lack of standard metadata and the low utilization of image information. To solve these problems, we propose the metadata design and machine learning-based automatic indexing technology for the efficient management of the image archives of local governments in Korea. Moreover, we design metadata items specialized for the image archives of local governments to improve the compatibility and include the elements that can represent the basic information and characteristics of images into the metadata items, enabling efficient management. In addition, the text and objects in images, which include pieces of information that reflect events and categories, are automatically indexed based on the machine learning technology, enhancing users' search convenience. Lastly, we developed the program that automatically extracts text and objects from image archives using the proposed method, and stores the extracted contents and basic information in the metadata items we designed.
Keywords
Image Archive; Metadata; OCR; Deep Learning; Automatic Indexing;
Citations & Related Records
연도 인용수 순위
  • Reference
1 National Archives of Korea (2018). Records Management Guidelines.
2 Kim, Geun-hyung, Jung, Young-Mi, Lee, Bonghwan, Cho, Yong-sang, Song, Eun-Ji, Choi, Hee-Sung, & Seol, Sehee (2013). Research on metadata standards optimized for education in the media ecosystem, Korea Education and Research Information Service.
3 Shin, Dong-Hyeon, Jung, Se-Young, & Kim, Seon-Heon (2009). A Case Study of the Audio-Visual Archives System Development and Management. Journal of Korean Society of Archives and Records Management, 9(1), 33-50. https://doi.org/10.14404/JKSARM.2009.9.1.033
4 Cha, Seung-Jun, Choi, Yun-Jeong, & Lee, Kyu-Chul (2009). Metadata Design for Archiving Public Deep Web Records. The Journal of Society for e-Business Studies, 14(4), 181-193.
5 Hwang, Yun-Young, Lim, Hyusk-Soo, & Lee, Kyu-Chul (2005). A Design of Metadata for Government Electronic Records Long-Term Preservation. Proceedings of the Korean Information Science Society Conference.
6 Huang, J., Rathod, V., Sun, C., Zhu, M., Korattikara, A., Fathi, A., ... & Murphy, K. (2017). Speed/accuracy Trade-offs for Modern Convolutional Object Detectors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 7310-7311. https://doi.org/10.1109/cvpr.2017.351
7 Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016). Ssd: Single Shot Multibox Detector. In European Conference on Computer Vision, 21-37. https://doi.org/10.1007/978-3-319-46448-0_2
8 Mori, S., Nishida, H., & Yamada, H. (1999). Optical Character Recognition. John Wiley & Sons, Inc.
9 Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You Only Look Once: Unified, Real-time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 779-788. https://doi.org/10.1109/cvpr.2016.91
10 Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-cnn: Towards Real-time Object Detection with Region Proposal Networks. In Advances in Neural Information Processing Systems, 91-99. https://doi.org/10.1109/tpami.2016.2577031
11 Gunsan Photo Bank. Retrieved April 20, 2020, from https://uprbank.kr/
12 Suwon Photo Bank. Retrieved April 20, 2020, from http://photo.suwon.go.kr/
13 Seogwipo Photo DB. Retrieved April 20, 2020, from http://photo.seogwipo.go.kr/
14 Exif (2002). Exchangeable Image File Format for Digital Still Cameras: Exif Version 2.2. Retrieved April 20, 2020, from https://www.exif.org/
15 Google Art & Culture. Retrieved April 20, 2020, from https://artsandculture.google.com/
16 IPTC (2017). IPTC Photo Metadata Standard. Retrieved April 20, 2020, from http://www.iptc.org
17 COCO data set [Data File]. Retrieved April 20, 2020, from http://cocodataset.org/
18 Musee national des Arts asiatiques-Guimet. Retrieved April 20, 2020, from https://www.guimet.fr/collections/afghanistan-pakistan/
19 The British Museum. Retrieved April 20, 2020, from https://www.britishmuseum.org/
20 Abbyy OCR. [Computer Software]. Seoul, KR: ReTIA
21 Google Cloud Vision API [Computer Software]. California, U.S.A: Google
22 Microsoft Computer Vision API [Computer Software]. Washington, U.S.A: Microsoft
23 Tesseract (Version 4.1.1) [Computer Software]. California, U.S.A: Google