Search | Korea Science

Representative Batch Normalization for Scene Text Recognition

Sun, Yajie;Cao, Xiaoling;Sun, Yingying
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.16 no.7
- /
- pp.2390-2406
- /
- 2022
Scene text recognition has important application value and attracted the interest of plenty of researchers. At present, many methods have achieved good results, but most of the existing approaches attempt to improve the performance of scene text recognition from the image level. They have a good effect on reading regular scene texts. However, there are still many obstacles to recognizing text on low-quality images such as curved, occlusion, and blur. This exacerbates the difficulty of feature extraction because the image quality is uneven. In addition, the results of model testing are highly dependent on training data, so there is still room for improvement in scene text recognition methods. In this work, we present a natural scene text recognizer to improve the recognition performance from the feature level, which contains feature representation and feature enhancement. In terms of feature representation, we propose an efficient feature extractor combined with Representative Batch Normalization and ResNet. It reduces the dependence of the model on training data and improves the feature representation ability of different instances. In terms of feature enhancement, we use a feature enhancement network to expand the receptive field of feature maps, so that feature maps contain rich feature information. Enhanced feature representation capability helps to improve the recognition performance of the model. We conducted experiments on 7 benchmarks, which shows that this method is highly competitive in recognizing both regular and irregular texts. The method achieved top1 recognition accuracy on four benchmarks of IC03, IC13, IC15, and SVTP.
https://doi.org/10.3837/tiis.2022.07.015 인용 PDF KSCI HTML

Arabic Words Extraction and Character Recognition from Picturesque Image Macros with Enhanced VGG-16 based Model Functionality Using Neural Networks

Ayed Ahmad Hamdan Al-Radaideh;Mohd Shafry bin Mohd Rahim;Wad Ghaban;Majdi Bsoul;Shahid Kamal;Naveed Abbas
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.17 no.7
- /
- pp.1807-1822
- /
- 2023
Innovation and rapid increased functionality in user friendly smartphones has encouraged shutterbugs to have picturesque image macros while in work environment or during travel. Formal signboards are placed with marketing objectives and are enriched with text for attracting people. Extracting and recognition of the text from natural images is an emerging research issue and needs consideration. When compared to conventional optical character recognition (OCR), the complex background, implicit noise, lighting, and orientation of these scenic text photos make this problem more difficult. Arabic language text scene extraction and recognition adds a number of complications and difficulties. The method described in this paper uses a two-phase methodology to extract Arabic text and word boundaries awareness from scenic images with varying text orientations. The first stage uses a convolution autoencoder, and the second uses Arabic Character Segmentation (ACS), which is followed by traditional two-layer neural networks for recognition. This study presents the way that how can an Arabic training and synthetic dataset be created for exemplify the superimposed text in different scene images. For this purpose a dataset of size 10K of cropped images has been created in the detection phase wherein Arabic text was found and 127k Arabic character dataset for the recognition phase. The phase-1 labels were generated from an Arabic corpus of quotes and sentences, which consists of 15kquotes and sentences. This study ensures that Arabic Word Awareness Region Detection (AWARD) approach with high flexibility in identifying complex Arabic text scene images, such as texts that are arbitrarily oriented, curved, or deformed, is used to detect these texts. Our research after experimentations shows that the system has a 91.8% word segmentation accuracy and a 94.2% character recognition accuracy. We believe in the future that the researchers will excel in the field of image processing while treating text images to improve or reduce noise by processing scene images in any language by enhancing the functionality of VGG-16 based model using Neural Networks.
https://doi.org/10.3837/tiis.2023.07.004 인용 PDF HTML

Real Scene Text Image Super-Resolution Based on Multi-Scale and Attention Fusion

Xinhua Lu;Haihai Wei;Li Ma;Qingji Xue;Yonghui Fu
- Journal of Information Processing Systems
- /
- v.19 no.4
- /
- pp.427-438
- /
- 2023
Plenty of works have indicated that single image super-resolution (SISR) models relying on synthetic datasets are difficult to be applied to real scene text image super-resolution (STISR) for its more complex degradation. The up-to-date dataset for realistic STISR is called TextZoom, while the current methods trained on this dataset have not considered the effect of multi-scale features of text images. In this paper, a multi-scale and attention fusion model for realistic STISR is proposed. The multi-scale learning mechanism is introduced to acquire sophisticated feature representations of text images; The spatial and channel attentions are introduced to capture the local information and inter-channel interaction information of text images; At last, this paper designs a multi-scale residual attention module by skillfully fusing multi-scale learning and attention mechanisms. The experiments on TextZoom demonstrate that the model proposed increases scene text recognition's (ASTER) average recognition accuracy by 1.2% compared to text super-resolution network.
https://doi.org/10.3745/JIPS.02.0199 인용 PDF

An End-to-End Sequence Learning Approach for Text Extraction and Recognition from Scene Image

Lalitha, G.;Lavanya, B.
- International Journal of Computer Science & Network Security
- /
- v.22 no.7
- /
- pp.220-228
- /
- 2022
Image always carry useful information, detecting a text from scene images is imperative. The proposed work's purpose is to recognize scene text image, example boarding image kept on highways. Scene text detection on highways boarding's plays a vital role in road safety measures. At initial stage applying preprocessing techniques to the image is to sharpen and improve the features exist in the image. Likely, morphological operator were applied on images to remove the close gaps exists between objects. Here we proposed a two phase algorithm for extracting and recognizing text from scene images. In phase I text from scenery image is extracted by applying various image preprocessing techniques like blurring, erosion, tophat followed by applying thresholding, morphological gradient and by fixing kernel sizes, then canny edge detector is applied to detect the text contained in the scene images. In phase II text from scenery image recognized using MSER (Maximally Stable Extremal Region) and OCR; Proposed work aimed to detect the text contained in the scenery images from popular dataset repositories SVT, ICDAR 2003, MSRA-TD 500; these images were captured at various illumination and angles. Proposed algorithm produces higher accuracy in minimal execution time compared with state-of-the-art methodologies.
https://doi.org/10.22937/IJCSNS.2022.22.7.27 인용 PDF KSCI

Candidate Word List and Probability Score Guided for Korean Scene Text Recognition (후보 단어 리스트와 확률 점수에 기반한 한국어 문자 인식 모델)

Lee, Yoonji;Lee, Jong-Min
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2022.05a
- /
- pp.73-75
- /
- 2022
Scene Text Recognition is a technology used in the field of artificial intelligence that requires manless robot, automatic vehicles and human-computer interaction. Though scene text images are distorted by noise interference, such as illumination, low resolution and blurring. Unlike previous studies that recognized only English, this paper shows a strong recognition accuracy including various characters, English, Korean, special character and numbers. Instead of selecting only one class having the highest probability value, a candidate word can be generated by considering the probability value of the second rank as well, thus a method can be corrected an existing language misrecognition problem.
PDF

Scene Text Recognition Performance Improvement through an Add-on of an OCR based Classifier (OCR 엔진 기반 분류기 애드온 결합을 통한 이미지 내부 텍스트 인식 성능 향상)

Chae, Ho-Yeol;Seok, Ho-Sik
- Journal of IKEEE
- /
- v.24 no.4
- /
- pp.1086-1092
- /
- 2020
An autonomous agent for real world should be able to recognize text in scenes. With the advancement of deep learning, various DNN models have been utilized for transformation, feature extraction, and predictions. However, the existing state-of-the art STR (Scene Text Recognition) engines do not achieve the performance required for real world applications. In this paper, we introduce a performance-improvement method through an add-on composed of an OCR (Optical Character Recognition) engine and a classifier for STR engines. On instances from IC13 and IC15 datasets which a STR engine failed to recognize, our method recognizes 10.92% of unrecognized characters.
https://doi.org/10.7471/ikeee.2020.24.4.1086 인용 PDF KSCI

Touch TT: Scene Text Extractor Using Touchscreen Interface

Jung, Je-Hyun;Lee, Seong-Hun;Cho, Min-Su;Kim, Jin-Hyung
- ETRI Journal
- /
- v.33 no.1
- /
- pp.78-88
- /
- 2011
In this paper, we present the Touch Text exTractor (Touch TT), an interactive text segmentation tool for the extraction of scene text from camera-based images. Touch TT provides a natural interface for a user to simply indicate the location of text regions with a simple touchline. Touch TT then automatically estimates the text color and roughly locates the text regions. By inferring text characteristics from the estimated text color and text region, Touch TT can extract text components. Touch TT can also handle partially drawn lines which cover only a small section of text area. The proposed system achieves reasonable accuracy for text extraction from moderately difficult examples from the ICDAR 2003 database and our own database.
https://doi.org/10.4218/etrij.11.1510.0029 인용 PDF KSCI

Korean Text Image Super-Resolution for Improving Text Recognition Accuracy (텍스트 인식률 개선을 위한 한글 텍스트 이미지 초해상화)

Junhyeong Kwon;Nam Ik Cho
- Journal of Broadcast Engineering
- /
- v.28 no.2
- /
- pp.178-184
- /
- 2023
Finding texts in general scene images and recognizing their contents is a very important task that can be used as a basis for robot vision, visual assistance, and so on. However, for the low-resolution text images, the degradations, such as noise or blur included in text images, are more noticeable, which leads to severe performance degradation of text recognition accuracy. In this paper, we propose a new Korean text image super-resolution based on a Transformer-based model, which generally shows higher performance than convolutional neural networks. In the experiments, we show that text recognition accuracy for Korean text images can be improved when our proposed text image super-resolution method is used. We also propose a new Korean text image dataset for training our model, which contains massive HR-LR Korean text image pairs.
https://doi.org/10.5909/JBE.2023.28.2.178 인용 PDF

YOLO, EAST : Comparison of Scene Text Detection Performance, Using a Neural Network Model (YOLO, EAST: 신경망 모델을 이용한 문자열 위치 검출 성능 비교)

Park, Chan Yong;Lim, Young Min;Jeong, Seung Dae;Cho, Young Heuk;Lee, Byeong Chul;Lee, Gyu Hyun;Kim, Jin Wook
- KIPS Transactions on Software and Data Engineering
- /
- v.11 no.3
- /
- pp.115-124
- /
- 2022
In this paper, YOLO and EAST models are tested to analyze their performance in text area detecting for real-world and normal text images. The earl ier YOLO models which include YOLOv3 have been known to underperform in detecting text areas for given images, but the recently released YOLOv4 and YOLOv5 achieved promising performances to detect text area included in various images. Experimental results show that both of YOLO v4 and v5 models are expected to be widely used for text detection in the filed of scene text recognition in the future.
https://doi.org/10.3745/KTSDE.2022.11.3.115 인용 PDF KSCI

A Novel Text Sample Selection Model for Scene Text Detection via Bootstrap Learning

Kong, Jun;Sun, Jinhua;Jiang, Min;Hou, Jian
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.13 no.2
- /
- pp.771-789
- /
- 2019
Text detection has been a popular research topic in the field of computer vision. It is difficult for prevalent text detection algorithms to avoid the dependence on datasets. To overcome this problem, we proposed a novel unsupervised text detection algorithm inspired by bootstrap learning. Firstly, the text candidate in a novel form of superpixel is proposed to improve the text recall rate by image segmentation. Secondly, we propose a unique text sample selection model (TSSM) to extract text samples from the current image and eliminate database dependency. Specifically, to improve the precision of samples, we combine maximally stable extremal regions (MSERs) and the saliency map to generate sample reference maps with a double threshold scheme. Finally, a multiple kernel boosting method is developed to generate a strong text classifier by combining multiple single kernel SVMs based on the samples selected from TSSM. Experimental results on standard datasets demonstrate that our text detection method is robust to complex backgrounds and multilingual text and shows stable performance on different standard datasets.
https://doi.org/10.3837/tiis.2019.02.016 인용 PDF KSCI HTML

Search Result 30, Processing Time 0.023 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)