• Title/Summary/Keyword: Deep learning segmentation

Search Result 391, Processing Time 0.03 seconds

Arabic Words Extraction and Character Recognition from Picturesque Image Macros with Enhanced VGG-16 based Model Functionality Using Neural Networks

  • Ayed Ahmad Hamdan Al-Radaideh;Mohd Shafry bin Mohd Rahim;Wad Ghaban;Majdi Bsoul;Shahid Kamal;Naveed Abbas
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.7
    • /
    • pp.1807-1822
    • /
    • 2023
  • Innovation and rapid increased functionality in user friendly smartphones has encouraged shutterbugs to have picturesque image macros while in work environment or during travel. Formal signboards are placed with marketing objectives and are enriched with text for attracting people. Extracting and recognition of the text from natural images is an emerging research issue and needs consideration. When compared to conventional optical character recognition (OCR), the complex background, implicit noise, lighting, and orientation of these scenic text photos make this problem more difficult. Arabic language text scene extraction and recognition adds a number of complications and difficulties. The method described in this paper uses a two-phase methodology to extract Arabic text and word boundaries awareness from scenic images with varying text orientations. The first stage uses a convolution autoencoder, and the second uses Arabic Character Segmentation (ACS), which is followed by traditional two-layer neural networks for recognition. This study presents the way that how can an Arabic training and synthetic dataset be created for exemplify the superimposed text in different scene images. For this purpose a dataset of size 10K of cropped images has been created in the detection phase wherein Arabic text was found and 127k Arabic character dataset for the recognition phase. The phase-1 labels were generated from an Arabic corpus of quotes and sentences, which consists of 15kquotes and sentences. This study ensures that Arabic Word Awareness Region Detection (AWARD) approach with high flexibility in identifying complex Arabic text scene images, such as texts that are arbitrarily oriented, curved, or deformed, is used to detect these texts. Our research after experimentations shows that the system has a 91.8% word segmentation accuracy and a 94.2% character recognition accuracy. We believe in the future that the researchers will excel in the field of image processing while treating text images to improve or reduce noise by processing scene images in any language by enhancing the functionality of VGG-16 based model using Neural Networks.

Analysis of Deep Learning-Based Pedestrian Environment Assessment Factors Using Urban Street View Images (도시 스트리트뷰 영상을 이용한 딥러닝 기반 보행환경 평가 요소 분석)

  • Ji-Yeon Hwang;Cheol-Ung Choi;Kwang-Woo Nam;Chang-Woo Lee
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.28 no.6
    • /
    • pp.45-52
    • /
    • 2023
  • Recently, as the importance of walking in daily life has been emphasized, projects to guarantee walking rights and create a pedestrian environment are being promoted throughout the region. In previous studies, a pedestrian environment assessment was conducted using Jeonju-si road images, and an image comparison pair data set was constructed. However, data sets expressed in numbers have difficulty in generalizing the judgment criteria of pedestrian environment assessors or visually identifying the pedestrian environment preferred by pedestrians. Therefore, this study proposes a method to interpret the results of the pedestrian environment assessment through data visualization by building a web application. According to the semantic segmentation result of analyzing the walking environment components that affect pedestrian environment assessors, it was confirmed that pedestrians did not prefer environments with a lot of "earth" and "grass," and preferred environments with "signboards" and "sidewalks." The proposed study is expected to identify and analyze the results randomly selected by participants in the future pedestrian environment evaluation, and believed that more improved accuracy can be obtained by pre-processing the data purification process.

Convolutional neural networks for automated tooth numbering on panoramic radiographs: A scoping review

  • Ramadhan Hardani Putra;Eha Renwi Astuti;Aga Satria Nurrachman;Dina Karimah Putri;Ahmad Badruddin Ghazali;Tjio Andrinanti Pradini;Dhinda Tiara Prabaningtyas
    • Imaging Science in Dentistry
    • /
    • v.53 no.4
    • /
    • pp.271-281
    • /
    • 2023
  • Purpose: The objective of this scoping review was to investigate the applicability and performance of various convolutional neural network (CNN) models in tooth numbering on panoramic radiographs, achieved through classification, detection, and segmentation tasks. Materials and Methods: An online search was performed of the PubMed, Science Direct, and Scopus databases. Based on the selection process, 12 studies were included in this review. Results: Eleven studies utilized a CNN model for detection tasks, 5 for classification tasks, and 3 for segmentation tasks in the context of tooth numbering on panoramic radiographs. Most of these studies revealed high performance of various CNN models in automating tooth numbering. However, several studies also highlighted limitations of CNNs, such as the presence of false positives and false negatives in identifying decayed teeth, teeth with crown prosthetics, teeth adjacent to edentulous areas, dental implants, root remnants, wisdom teeth, and root canal-treated teeth. These limitations can be overcome by ensuring both the quality and quantity of datasets, as well as optimizing the CNN architecture. Conclusion: CNNs have demonstrated high performance in automated tooth numbering on panoramic radiographs. Future development of CNN-based models for this purpose should also consider different stages of dentition, such as the primary and mixed dentition stages, as well as the presence of various tooth conditions. Ultimately, an optimized CNN architecture can serve as the foundation for an automated tooth numbering system and for further artificial intelligence research on panoramic radiographs for a variety of purposes.

Performance Evaluation of YOLOv5 Model according to Various Hyper-parameters in Nuclear Medicine Phantom Images (핵의학 팬텀 영상에서 초매개변수 변화에 따른 YOLOv5 모델의 성능평가)

  • Min-Gwan Lee;Chanrok Park
    • Journal of the Korean Society of Radiology
    • /
    • v.18 no.1
    • /
    • pp.21-26
    • /
    • 2024
  • The one of the famous deep learning models for object detection task is you only look once version 5 (YOLOv5) framework based on the one stage architecture. In addition, YOLOv5 model indicated high performance for accurate lesion detection using the bottleneck CSP layer and skip connection function. The purpose of this study was to evaluate the performance of YOLOv5 framework according to various hyperparameters in position emission tomogrpahy (PET) phantom images. The dataset was obtained from QIN PET segmentation challenge in 500 slices. We set the bounding box to generate ground truth dataset using labelImg software. The hyperparameters for network train were applied by changing optimization function (SDG, Adam, and AdamW), activation function (SiLU, LeakyRelu, Mish, and Hardwish), and YOLOv5 model size (nano, small, large, and xlarge). The intersection over union (IOU) method was used for performance evaluation. As a results, the condition of outstanding performance is to apply AdamW, Hardwish, and nano size for optimization function, activation function and model version, respectively. In conclusion, we confirmed the usefulness of YOLOv5 network for object detection performance in nuclear medicine images.

Real-time Segmentation of Black Ice Region in Infrared Road Images

  • Li, Yu-Jie;Kang, Sun-Kyoung;Jung, Sung-Tae
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.2
    • /
    • pp.33-42
    • /
    • 2022
  • In this paper, we proposed a deep learning model based on multi-scale dilated convolution feature fusion for the segmentation of black ice region in road image to send black ice warning to drivers in real time. In the proposed multi-scale dilated convolution feature fusion network, different dilated ratio convolutions are connected in parallel in the encoder blocks, and different dilated ratios are used in different resolution feature maps, and multi-layer feature information are fused together. The multi-scale dilated convolution feature fusion improves the performance by diversifying and expending the receptive field of the network and by preserving detailed space information and enhancing the effectiveness of diated convolutions. The performance of the proposed network model was gradually improved with the increase of the number of dilated convolution branch. The mIoU value of the proposed method is 96.46%, which was higher than the existing networks such as U-Net, FCN, PSPNet, ENet, LinkNet. The parameter was 1,858K, which was 6 times smaller than the existing LinkNet model. From the experimental results of Jetson Nano, the FPS of the proposed method was 3.63, which can realize segmentation of black ice field in real time.

Classification of Leukemia Disease in Peripheral Blood Cell Images Using Convolutional Neural Network

  • Tran, Thanh;Park, Jin-Hyuk;Kwon, Oh-Heum;Moon, Kwang-Seok;Lee, Suk-Hwan;Kwon, Ki-Ryong
    • Journal of Korea Multimedia Society
    • /
    • v.21 no.10
    • /
    • pp.1150-1161
    • /
    • 2018
  • Classification is widely used in medical images to categorize patients and non-patients. However, conventional classification requires a complex procedure, including some rigid steps such as pre-processing, segmentation, feature extraction, detection, and classification. In this paper, we propose a novel convolutional neural network (CNN), called LeukemiaNet, to specifically classify two different types of leukemia, including acute lymphoblastic leukemia (ALL) and acute myeloid leukemia (AML), and non-cancerous patients. To extend the limited dataset, a PCA color augmentation process is utilized before images are input into the LeukemiaNet. This augmentation method enhances the accuracy of our proposed CNN architecture from 96.9% to 97.2% for distinguishing ALL, AML, and normal cell images.

Low-Quality Banknote Serial Number Recognition Based on Deep Neural Network

  • Jang, Unsoo;Suh, Kun Ha;Lee, Eui Chul
    • Journal of Information Processing Systems
    • /
    • v.16 no.1
    • /
    • pp.224-237
    • /
    • 2020
  • Recognition of banknote serial number is one of the important functions for intelligent banknote counter implementation and can be used for various purposes. However, the previous character recognition method is limited to use due to the font type of the banknote serial number, the variation problem by the solid status, and the recognition speed issue. In this paper, we propose an aspect ratio based character region segmentation and a convolutional neural network (CNN) based banknote serial number recognition method. In order to detect the character region, the character area is determined based on the aspect ratio of each character in the serial number candidate area after the banknote area detection and de-skewing process is performed. Then, we designed and compared four types of CNN models and determined the best model for serial number recognition. Experimental results showed that the recognition accuracy of each character was 99.85%. In addition, it was confirmed that the recognition performance is improved as a result of performing data augmentation. The banknote used in the experiment is Indian rupee, which is badly soiled and the font of characters is unusual, therefore it can be regarded to have good performance. Recognition speed was also enough to run in real time on a device that counts 800 banknotes per minute.

MSFM: Multi-view Semantic Feature Fusion Model for Chinese Named Entity Recognition

  • Liu, Jingxin;Cheng, Jieren;Peng, Xin;Zhao, Zeli;Tang, Xiangyan;Sheng, Victor S.
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.6
    • /
    • pp.1833-1848
    • /
    • 2022
  • Named entity recognition (NER) is an important basic task in the field of Natural Language Processing (NLP). Recently deep learning approaches by extracting word segmentation or character features have been proved to be effective for Chinese Named Entity Recognition (CNER). However, since this method of extracting features only focuses on extracting some of the features, it lacks textual information mining from multiple perspectives and dimensions, resulting in the model not being able to fully capture semantic features. To tackle this problem, we propose a novel Multi-view Semantic Feature Fusion Model (MSFM). The proposed model mainly consists of two core components, that is, Multi-view Semantic Feature Fusion Embedding Module (MFEM) and Multi-head Self-Attention Mechanism Module (MSAM). Specifically, the MFEM extracts character features, word boundary features, radical features, and pinyin features of Chinese characters. The acquired font shape, font sound, and font meaning features are fused to enhance the semantic information of Chinese characters with different granularities. Moreover, the MSAM is used to capture the dependencies between characters in a multi-dimensional subspace to better understand the semantic features of the context. Extensive experimental results on four benchmark datasets show that our method improves the overall performance of the CNER model.

Collision Avoidance Sensor System for Mobile Crane (전지형 크레인의 인양물 충돌방지를 위한 환경탐지 센서 시스템 개발)

  • Kim, Ji-Chul;Kim, Young Jea;Kim, Mingeuk;Lee, Hanmin
    • Journal of Drive and Control
    • /
    • v.19 no.4
    • /
    • pp.62-69
    • /
    • 2022
  • Construction machinery is exposed to accidents such as collisions, narrowness, and overturns during operation. In particular, mobile crane is operated only with the driver's vision and limited information of the assistant worker. Thus, there is a high risk of an accident. Recently, some collision avoidance device using sensors such as cameras and LiDAR have been applied. However, they are still insufficient to prevent collisions in the omnidirectional 3D space. In this study, a rotating LiDAR device was developed and applied to a 250-ton crane to obtain a full-space point cloud. An algorithm that could provide distance information and safety status to the driver was developed. Also, deep-learning segmentation algorithm was used to classify human-worker. The developed device could recognize obstacles within 100m of a 360-degree range. In the experiment, a safety distance was calculated with an error of 10.3cm at 30m to give the operator an accurate distance and collision alarm.

Detecting Numeric and Character Areas of Low-quality License Plate Images using YOLOv4 Algorithm (YOLOv4 알고리즘을 이용한 저품질 자동차 번호판 영상의 숫자 및 문자영역 검출)

  • Lee, Jeonghwan
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.18 no.4
    • /
    • pp.1-11
    • /
    • 2022
  • Recently, research on license plate recognition, which is a core technology of an intelligent transportation system(ITS), is being actively conducted. In this paper, we propose a method to extract numbers and characters from low-quality license plate images by applying the YOLOv4 algorithm. YOLOv4 is a one-stage object detection method using convolution neural network including BACKBONE, NECK, and HEAD parts. It is a method of detecting objects in real time rather than the previous two-stage object detection method such as the faster R-CNN. In this paper, we studied a method to directly extract number and character regions from low-quality license plate images without additional edge detection and image segmentation processes. In order to evaluate the performance of the proposed method we experimented with 500 license plate images. In this experiment, 350 images were used for training and the remaining 150 images were used for the testing process. Computer simulations show that the mean average precision of detecting number and character regions on vehicle license plates was about 93.8%.