Multi-modal Image Processing for Improving Recognition Accuracy of Text Data in Images

Park, Jungeun;Joo, Gyeongdon;Kim, Chulyun;

Database Research (데이타베이스연구회지:데이타베이스연구)

Volume 34 Issue 3
/
Pages.148-158
/
2018
/
1598-9798(pISSN)

Korean Institute of Information Scientists and Engineers (한국정보과학회)

Multi-modal Image Processing for Improving Recognition Accuracy of Text Data in Images

이미지 내의 텍스트 데이터 인식 정확도 향상을 위한 멀티 모달 이미지 처리 프로세스

박정은 (숙명여자대학교 IT공학과, 한국인지과학산업협회) ;
주경돈 (숙명여자대학교 IT공학과, 한국인지과학산업협회) ;
김철연 (숙명여자대학교 IT공학과, 한국인지과학산업협회)

Received : 2018.08.16
Accepted : 2018.12.19
Published : 2018.12.31

⟨ Previous Next ⟩

Abstract

The optical character recognition (OCR) is a technique to extract and recognize texts from images. It is an important preprocessing step in data analysis since most actual text information is embedded in images. Many OCR engines have high recognition accuracy for images where texts are clearly separable from background, such as white background and black lettering. However, they have low recognition accuracy for images where texts are not easily separable from complex background. To improve this low accuracy problem with complex images, it is necessary to transform the input image to make texts more noticeable. In this paper, we propose a method to segment an input image into text lines to enable OCR engines to recognize each line more efficiently, and to determine the final output by comparing the recognition rates of CLAHE module and Two-step module which distinguish texts from background regions based on image processing techniques. Through thorough experiments comparing with well-known OCR engines, Tesseract and Abbyy, we show that our proposed method have the best recognition accuracy with complex background images.

광학 문자 인식(OCR)은 텍스트를 포함한 이미지에서 텍스트 영역을 인식하고 이로부터 텍스트를 추출하는 기술이다. 전체 텍스트 데이터 중 상당히 많은 텍스트 정보가 이미지에 포함되어 있기 때문에 OCR은 데이터 분석 분야에 있어 중요한 전처리 단계를 담당한다. 대부분의 OCR 엔진이, 흰 바탕의 검정 글씨의 단순한 형태를 가진 이미지와 같은, 텍스트와 배경의 구분이 뚜렷한 저 복잡도 이미지에 대해서는 높은 인식률을 보이는 반면, 텍스트와 배경의 구분이 뚜렷하지 않은 고 복잡도 이미지에 대해서는 저조한 인식률을 보이기 때문에, 인식률 개선을 위해 입력 이미지를 OCR 엔진이 처리하기 용이한 이미지로 변형하는 전처리 작업이 필요하게 된다. 따라서 본 논문에서는 OCR 엔진의 정확성 증대를 위해 텍스트 라인별로 이미지를 분리하고, 영상처리 기법 기반의 CLAHE 모듈과 Two-step 모듈을 병렬적으로 수행하여 텍스트와 배경 영역을 효율적으로 분리한 후 텍스트를 인식한다. 이어서 두 모듈의 결과 텍스트에 대하여 N-gram방법과 Hunspell 사전을 결합한 알고리즘으로 인식률을 비교하여 가장 높은 인식률의 결과 텍스트를 최종 결과물로 선정하는 방법론을 제안한다. 대표적인 OCR 엔진인 Tesseract와 Abbyy와의 다양한 비교 실험을 통해 본 연구에서 제안하는 모듈이 복잡한 배경을 가진 이미지에서 가장 정확한 텍스트 인식률을 보임을 보였다.

Keywords

Acknowledgement

References

문헌정보학 용어사전 편찬위원회, 문헌정보학 용어사전, 한국도서관협회, 2010.
J. H. Ju and J. S. Oh, "An adaptive binarization algorithm for de graded document images." Journal of Korean Institute of Communications and Information Sciences, vol. 37, no. 7, pp 581-585, 2012. https://doi.org/10.7840/KICS.2012.37.7A.581
Tesseract-OCR. Available: https://github.com/ tesseract-ocr.
ABBYY. Available: https://www.abbyy.com.
H. Chen, S. S. Tsai, G. Schroth, D. M. Chen, R. Grzeszczuk, B. Girod, "Robust text detection in natural images with edge-enhanced maximally stable extremal regions," Proc. of ICIP, pp.2609- 2612, 2011.
Y. Li, H. Lu, "Scene text detection via stroke width," Proc. of ICPR, pp.681-684, 2012.
M. W. Kim, and I. S. Oh, "Stroke Width Transform and Feature Extraction for Maximally Stable External Regions(MSER)", Journal of KIISE : Computing Practices and Letters, Vol. 20, No. 1, pp. 21-25, 2014.
Bieniecki, Wojciech, Szymon Grabowski, and Wojciech Rozenberg. "Image preprocessing for improving ocr accuracy." Perspective Technologies and Methods in MEMS Design, 2007.
M.Elmore and M.Martonosy, " A Morphological Image Preprocessing Suite for OCR on Natural Scene Images", 2008
Huang, Chuen-Min, Yu-Kai Lin, and Rih-Wei Chang. "Apply Adaptive Threshold Operation and Conditional Connected-component to Image Text Recognition." Computer Science and Information Technology 2.2, pp.87-94, 2014.
Z. Tian, W. Huang, T. He, P. He, and Y. Qiao. "Detecting text in natural image with connectionist text proposal network." European Conference on Computer Vision, pp 56-72. Springer, 2016.
S.M. Pizer, E.P. Amburn, J.D. Austin, R. Cromartie, A. Geselowitz, T. Greer, B.M. ter Haar Romeny, J.B. Zimmerman, and K. Zuiderveld, "Adaptive Histogram Equalization and its Variations," Computer Vision, Graphics and Image Processing, vol. 39, 1987, pp. 355-368. https://doi.org/10.1016/S0734-189X(87)80186-X
Hunspell-korean. Available: https://github.com/spellcheck-ko/hunspell-dict-ko

Database Research (데이타베이스연구회지:데이타베이스연구)

Multi-modal Image Processing for Improving Recognition Accuracy of Text Data in Images

이미지 내의 텍스트 데이터 인식 정확도 향상을 위한 멀티 모달 이미지 처리 프로세스

Abstract

Keywords

Acknowledgement

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)