DOI QR코드

DOI QR Code

Digitization of Old Korean Texts with Obsolete Korean Characters and Suggestion for Improvement of Information Sharing

옛한글 문서의 전자문서화와 정보공유 방법 제안

  • Received : 2021.03.22
  • Accepted : 2021.05.26
  • Published : 2021.06.29

Abstract

A vast amount of materials-such as prints, woodblock prints, manuscripts, old novels, and letters-written in old Korean and using old grammar and/or obsolete characters, are collected in many institutions, including the Jangseogak at the Academy of Korean Studies. Digitization of these texts has required a prolonged manual inputting process. Individual researchers, who majored in old Korean, have read and typed the characters into electronic documents, which depends upon individual skill, effort, and approach, and is particularly limiting because none can be significantly increased. To date, only a small proportion of the old Korean document collections, currently kept in storage, have been digitized and made available to the public. Even the electronic formats of the texts prove difficult to displaying correctly, due to the incompatibility between the old Korean characters and the character set on today's electronic devices. To improve the techniques and efficiency of digitizing old Korean texts, it is necessary to develop optical character recognition (OCR), which will analyze images of old Korean documents, as well as input, display, and storage methods.

옛한글로 저술된 자료는 활자 인쇄본, 목판 인쇄본, 필사본, 고소설, 서간 등 방대한 자료가 한국학중앙연구원 장서각을 비롯하여 많은 기관에 소장되어 있다. 옛한글을 전산정보화하기 위해서는 수작업에 의한 '입력'과정이 필요하다. 옛한글 문서의 전자문서화 작업이 오랫동안 진행되어 왔으나 옛한글을 전공한 연구자 개인의 노력으로 옛한글을 읽고 입력하여 전자자료화되고 있는 실정이다. 연구자의 숙련도가 개인적인 작업능력의 향상에 머무르고 기술의 축적으로 이어지지 못한다. 현재까지 극히 일부분의 옛한글 문서만이 소개되고 대부분의 자료는 수장고에 보관되어 있는 상태이다. 어렵게 전자문서화된 옛한글 고문서도 전자기기 간의 호환성 문제로 정보 공유 및 표시에도 어려움이 있다. 옛한글 문서의 전자문서화의 작업효율을 높이고 전자문서화 기술의 축적을 위해서는 옛한글의 입력, 표시, 저장 방법의 개선을 비롯하여 옛한글 문서의 이미지 분석을 통한 광학적 문자인식(OCR)의 개발이 필요하다.

Keywords

Acknowledgement

본 연구를 수행함에 있어서 많은 격려와 지원을 해 주신 한국학중앙연구원 장서각 왕실문헌연구실 김덕수 실장님께 감사드립니다. 기술적인 협력을 주신 미국 WaferMasters, Inc.의 강기택 씨와 김정곤 박사님, 프로젝트의 전반적인 진행과정에 도움을 주신 (주)휴니텍 유지현님께도 사의를 표합니다.

References

  1. Kim, G., Kim, J.G., Kang K. and Yoo, W.S., 2019, Image-based quantitative analysis of foxing stains on old printed paper documents. Heritage, 2(3), 2665-2677. https://doi.org/10.3390/heritage2030164
  2. Kim, H.G., 1990, A study on the composition of Hunminjeongeum code system and computer processing method for the computerization of classical data in Korean and Korean literature, Korean Culture Research, 23, 145-187.
  3. Kyungpook National University, 2021, http://www.dila.co.kr/bbs/write.php?bo_table=opentrans (March 22, 2021)
  4. Nara National Research Institute for Cultural Properties, 2021, https://mojizo.nabunken.go.jp/ (March 22, 2021)
  5. National Institute of Korean Language, 2021, Old Korean Input System, https://www.korean.go.kr/common/oldHangeul.do (March 22, 2021)
  6. Online Hangul Input, 2021, Online Hangul Input - Dubeolsik Old Hangul Keyboard, https://pat.im/1179 (March 22,2021)
  7. The Academy of Korean Studies, 2021, Yu-Yi Yangmunrok, http://jsg.aks.ac.kr/viewer/viewIMok?dataId=K4-6792%7C001#node?depth=2&upPath=001&dataId=001 (March 22, 2021)
  8. Unicode, 2021, Hangul Jamo, https://www.unicode.org/charts/PDF/U1100.pdf (March 22, 2021)
  9. Wikipedia, 2021a, Optical character recognition, https://en.wikipedia.org/wiki/Optical_character_recognition (march 22, 2021)
  10. Wikipedia, 2021b, ASCII, https://en.wikipedia.org/wiki/ASCII, (March 22, 2021)
  11. Wikipedia, 2021c, Korean language and computers, https://en.wikipedia.org/wiki/Korean_language_and_computers (March 22, 2021)
  12. Wikipedia, 2021d, Hangul Jamo (Unicode block), https://en.wikipedia.org/wiki/Hangul_Jamo_(Unicode_block) (March 22, 2021)
  13. Wikipedia, 2021e, Nalgaeset Hangul Input, https://ko.wikipedia.org/wiki/%EB%82%A0%EA%B0%9C%EC%85%8B_%ED%95%9C%EA%B8%80_%EC%9E%85%EB%A0%A5%EA%B8%B0 (March 22, 2021)
  14. Yoo, W.S., 2020, Comparison of outlines by image analysis for derivation of objective validation results: "Ito Hirobumi's characters on the foundation stone" of the Main Building of Bank of Korea. Journal of Conservation Science, 36(6), 511-518. (in Korean with English abstract) https://doi.org/10.12654/JCS.2020.36.6.07
  15. Yoo, Y. and Yoo, W.S., 2021, Digital image comparisons for investigating aging effects and artificial modifications using image analysis software. Journal of Conservation Science, 37(1), 1-12. https://doi.org/10.12654/JCS.2021.37.1.01
  16. Yoo, W.S., Yoo, S.S., Yoo, B H. and Yoo, S.J., 2021, Investigation on the conservation status of the 50-year-old "Yu Kil-Chun Archives" and an effective and practical method of preserving and sharing contents. Journal of Conservation Science, 37(2), 167-178. (in Korean with English abstract) https://doi.org/10.12654/JCS.2021.37.2.08