A Study on Book Recovery Method Depending on Book Damage Levels Using Book Scan

북스캔을 이용한 도서 손상 단계에 따른 딥 러닝 기반 도서 복구 방법에 관한 연구

  • Kyungho Seok (School of Computer Science and Engineering, Soongsil University) ;
  • Johui Lee (School of Computer Science and Engineering, Soongsil University) ;
  • Byeongchan Park (School of Computer Science and Engineering, Soongsil University) ;
  • Seok-Yoon Kim (Dept. of Computer Science and Engineering, Soongsil University) ;
  • Youngmo Kim (Dept. of Computer Science and Engineering, Soongsil University)
  • 석경호 (숭실대학교 컴퓨터학부) ;
  • 이주희 (숭실대학교 컴퓨터학부) ;
  • 박병찬 (숭실대학교 컴퓨터학부) ;
  • 김석윤 (숭실대학교 컴퓨터학과) ;
  • 김영모 (숭실대학교 컴퓨터학과)
  • Received : 2023.12.04
  • Accepted : 2023.12.18
  • Published : 2023.12.31

Abstract

Recently, with the activation of eBook services, books are being published simultaneously as physical books and digitized eBooks. Paper books are more expensive than e-books due to printing and distribution costs, so demand for relatively inexpensive e-books is increasing. There are cases where previously published physical books cannot be digitized due to the circumstances of the publisher or author, so there is a movement among individual users to digitize books that have been published for a long time. However, existing research has only studied the advancement of the pre-processing process that can improve text recognition before applying OCR technology, and there are limitations to digitization depending on the condition of the book. Therefore, support for book digitization services depending on the condition of the physical book is needed. need. In this paper, we propose a method to support digitalization services according to the status of physical books held by book owners. Create images by scanning books and extract text information from the images through OCR. We propose a method to recover text that cannot be extracted depending on the state of the book using BERT, a natural language processing deep learning model. As a result, it was confirmed that the recovery method using BERT is superior when compared to RNN, which is widely used in recommendation technology.

Keywords

Acknowledgement

이 논문은 2023년도 산업통상자원부의 재원으로 한국산업기술평가원의 지원을 받아 수행된 연구임 (세부과제번호: 20016990, 지식서비스산업기술개발(R&D)).

References

  1. J. Han., "Study on the Win-Win Plan of Library E-book Service on the Controversial Issues with Publishers," Studies of Korean Publishing Science, Vol. 47, No. 6, pp. 107-129, 2021. DOI : 10.21732/skps.2021.103.107
  2. D. Kim., "Reading for single-person households burdened with increased luggage, using an e-book reader?," DAILY POP, 2023. https://www.dailypop.kr/news/articleView.html?idxno=67985
  3. H. Kim., "[UNN Report] Can the reading patterns of the MZ generation be identified according to MBTI type?," UNN, 2023. https://news.unn.net/news/articleView.html?idxno=547640
  4. I. Park., "Neighboring rights that publishers need to know," Publishing N, Vol. 49, 2023. https://nzine.kpipa.or.kr/sub/zoomin.php?ptype=view&idx=607&page=&code=zoomin&total_searchkey=%EA%B8%B0%ED%9A%8D
  5. C. Yang., "(A) study on the Econometric WTP Analysis in the Reception Value of eBooks in comparison with that of paper-books," Chung Ang University, 2002.
  6. Y. Jang., "Students 'self-book scan', universities 'e-book loan'... Paper books that have less space to stand, " Hankyoreh, 2021. https://www.hani.co.kr/arti/society/society_general/996678.html
  7. K. Lee., H. Chung., D. Ryu., and J. Lee., "A framework of management for preventing illegal distribution of pdf bookscan file," Journal of The Korea Institute of Information Security & Cryptology(JKIISC), Vol. 23, No. 5, pp. 897-907, 2013. DOI : 10.13089/JKIISC.2013.23.5.897
  8. Y. Koh., "Whether book scanning agency work and affiliation are violated = Japan's book scanning service case (Tokyo District Court, September 30, 2013, 2012 (ワ) No. 33525), " Korea Contents Association, Vol. 12, No. 2, pp.60-67, 2014. DOI : 10.20924/CCTHBL.2014.12.2.
  9. R. Smith., "An Overview of the Tesseract OCR Engine," Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), 2007. DOI: 10.1109/ICDAR.2007.4376991
  10. S. Jang., I. Yoo., S. Kim., Y. Kim., "A Real-time Bus Arrival Notification System for Visually Impaired Using Deep Learning," Journal of the Semiconductor & Display Technology, Vol. 22, No. 2, pp.24-29, 2023.
  11. J. Devlin., M. Chang., K. Lee., and K. Toutanova., "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding," Computation and Language (cs.CL), 2019. DOI : doi.org/10.48550/arXiv.1810.04805