• Title/Summary/Keyword: Deep learning enhancement

Search Result 118, Processing Time 0.029 seconds

Comparative Study of Fish Detection and Classification Performance Using the YOLOv8-Seg Model (YOLOv8-Seg 모델을 이용한 어류 탐지 및 분류 성능 비교연구)

  • Sang-Yeup Jin;Heung-Bae Choi;Myeong-Soo Han;Hyo-tae Lee;Young-Tae Son
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.30 no.2
    • /
    • pp.147-156
    • /
    • 2024
  • The sustainable management and enhancement of marine resources are becoming increasingly important issues worldwide. This study was conducted in response to these challenges, focusing on the development and performance comparison of fish detection and classification models as part of a deep learning-based technique for assessing the effectiveness of marine resource enhancement projects initiated by the Korea Fisheries Resources Agency. The aim was to select the optimal model by training various sizes of YOLOv8-Seg models on a fish image dataset and comparing each performance metric. The dataset used for model construction consisted of 36,749 images and label files of 12 different species of fish, with data diversity enhanced through the application of augmentation techniques during training. When training and validating five different YOLOv8-Seg models under identical conditions, the medium-sized YOLOv8m-Seg model showed high learning efficiency and excellent detection and classification performance, with the shortest training time of 13 h and 12 min, an of 0.933, and an inference speed of 9.6 ms. Considering the balance between each performance metric, this was deemed the most efficient model for meeting real-time processing requirements. The use of such real-time fish detection and classification models could enable effective surveys of marine resource enhancement projects, suggesting the need for ongoing performance improvements and further research.

Single Low-Light Ghost-Free Image Enhancement via Deep Retinex Model

  • Liu, Yan;Lv, Bingxue;Wang, Jingwen;Huang, Wei;Qiu, Tiantian;Chen, Yunzhong
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.5
    • /
    • pp.1814-1828
    • /
    • 2021
  • Low-light image enhancement is a key technique to overcome the quality degradation of photos taken under scotopic vision illumination conditions. The degradation includes low brightness, low contrast, and outstanding noise, which would seriously affect the vision of the human eye recognition ability and subsequent image processing. In this paper, we propose an approach based on deep learning and Retinex theory to enhance the low-light image, which includes image decomposition, illumination prediction, image reconstruction, and image optimization. The first three parts can reconstruct the enhanced image that suffers from low-resolution. To reduce the noise of the enhanced image and improve the image quality, a super-resolution algorithm based on the Laplacian pyramid network is introduced to optimize the image. The Laplacian pyramid network can improve the resolution of the enhanced image through multiple feature extraction and deconvolution operations. Furthermore, a combination loss function is explored in the network training stage to improve the efficiency of the algorithm. Extensive experiments and comprehensive evaluations demonstrate the strength of the proposed method, the result is closer to the real-world scene in lightness, color, and details. Besides, experiments also demonstrate that the proposed method with the single low-light image can achieve the same effect as multi-exposure image fusion algorithm and no ghost is introduced.

Performance comparison evaluation of speech enhancement using various loss functions (다양한 손실 함수를 이용한 음성 향상 성능 비교 평가)

  • Hwang, Seo-Rim;Byun, Joon;Park, Young-Cheol
    • The Journal of the Acoustical Society of Korea
    • /
    • v.40 no.2
    • /
    • pp.176-182
    • /
    • 2021
  • This paper evaluates and compares the performance of the Deep Nerual Network (DNN)-based speech enhancement models according to various loss functions. We used a complex network that can consider the phase information of speech as a baseline model. As the loss function, we consider two types of basic loss functions; the Mean Squared Error (MSE) and the Scale-Invariant Source-to-Noise Ratio (SI-SNR), and two types of perceptual-based loss functions, including the Perceptual Metric for Speech Quality Evaluation (PMSQE) and the Log Mel Spectra (LMS). The performance comparison was performed through objective evaluation and listening tests with outputs obtained using various combinations of the loss functions. Test results show that when a perceptual-based loss function was combined with MSE or SI-SNR, the overall performance is improved, and the perceptual-based loss functions, even exhibiting lower objective scores showed better performance in the listening test.

Comparative Analysis of Deep Learning Researches for Compressed Video Quality Improvement (압축 영상 화질 개선을 위한 딥 러닝 연구에 대한 분석)

  • Lee, Young-Woon;Kim, Byung-Gyu
    • Journal of Broadcast Engineering
    • /
    • v.24 no.3
    • /
    • pp.420-429
    • /
    • 2019
  • Recently, researches using Convolutional Neural Network (CNN)-based approaches have been actively conducted to improve the reduced quality of compressed video using block-based video coding standards such as H.265/HEVC. This paper aims to summarize and analyze the network models in these quality enhancement studies. At first the detailed components of CNN for quality enhancement are overviewed and then we summarize prior studies in the image domain. Next, related studies are summarized in three aspects of network structure, dataset, and training methods, and present representative models implementation and experimental results for performance comparison.

Enhancement of Tongue Segmentation by Using Data Augmentation (데이터 증강을 이용한 혀 영역 분할 성능 개선)

  • Chen, Hong;Jung, Sung-Tae
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.13 no.5
    • /
    • pp.313-322
    • /
    • 2020
  • A large volume of data will improve the robustness of deep learning models and avoid overfitting problems. In automatic tongue segmentation, the availability of annotated tongue images is often limited because of the difficulty of collecting and labeling the tongue image datasets in reality. Data augmentation can expand the training dataset and increase the diversity of training data by using label-preserving transformations without collecting new data. In this paper, augmented tongue image datasets were developed using seven augmentation techniques such as image cropping, rotation, flipping, color transformations. Performance of the data augmentation techniques were studied using state-of-the-art transfer learning models, for instance, InceptionV3, EfficientNet, ResNet, DenseNet and etc. Our results show that geometric transformations can lead to more performance gains than color transformations and the segmentation accuracy can be increased by 5% to 20% compared with no augmentation. Furthermore, a random linear combination of geometric and color transformations augmentation dataset gives the superior segmentation performance than all other datasets and results in a better accuracy of 94.98% with InceptionV3 models.

Rating wrinkled skin using deep learning (딥러닝 기반 주름 평가)

  • Kim, Jin-Sook;Kim, Yongnam;Kim, Duhong;Park, Lae-Jeong;Baek, Ji Hwoon;Kang, Sanggoo
    • Annual Conference of KIPS
    • /
    • 2018.10a
    • /
    • pp.637-640
    • /
    • 2018
  • The paper proposes a new deep network-based model that rates periorbital wrinkles in order to alleviate the shortcomings of the evaluation by human experts as well as to facilitate the automation. Periorbital wrinkles still need to be classified by human experts. Furthermore, the classification results from experts are different from each other in many cases due to the inter-interpreter variability and the absence of quantification criteria. Unlike existing classification methods which classify original images, the proposed model consists of a cascade of two deep networks: U-Net for the enhancement of wrinkles on an input image and VGG16 for final classification based on the wrinkle information. Experiments of the proposed model are made with a data set that consists of 433 images rated by experts, showing the promising performance.

Deep Learning-based Object Detection of Panels Door Open in Underground Utility Tunnel (딥러닝 기반 지하공동구 제어반 문열림 인식)

  • Gyunghwan Kim;Jieun Kim;Woosug Jung
    • Journal of the Society of Disaster Information
    • /
    • v.19 no.3
    • /
    • pp.665-672
    • /
    • 2023
  • Purpose: Underground utility tunnel is facility that is jointly house infrastructure such as electricity, water and gas in city, causing condensation problems due to lack of airflow. This paper aims to prevent electricity leakage fires caused by condensation by detecting whether the control panel door in the underground utility tunnel is open using a deep learning model. Method: YOLO, a deep learning object recognition model, is trained to recognize the opening and closing of the control panel door using video data taken by a robot patrolling the underground utility tunnel. To improve the recognition rate, image augmentation is used. Result: Among the image enhancement techniques, we compared the performance of the YOLO model trained using mosaic with that of the YOLO model without mosaic, and found that the mosaic technique performed better. The mAP for all classes were 0.994, which is high evaluation result. Conclusion: It was able to detect the control panel even when there were lights off or other objects in the underground cavity. This allows you to effectively manage the underground utility tunnel and prevent disasters.

Adaptation of Deep Learning Image Reconstruction for Pediatric Head CT: A Focus on the Image Quality (소아용 두부 컴퓨터단층촬영에서 딥러닝 영상 재구성 적용: 영상 품질에 대한 고찰)

  • Nim Lee;Hyun-Hae Cho;So Mi Lee;Sun Kyoung You
    • Journal of the Korean Society of Radiology
    • /
    • v.84 no.1
    • /
    • pp.240-252
    • /
    • 2023
  • Purpose To assess the effect of deep learning image reconstruction (DLIR) for head CT in pediatric patients. Materials and Methods We collected 126 pediatric head CT images, which were reconstructed using filtered back projection, iterative reconstruction using adaptive statistical iterative reconstruction (ASiR)-V, and all three levels of DLIR (TrueFidelity; GE Healthcare). Each image set group was divided into four subgroups according to the patients' ages. Clinical and dose-related data were reviewed. Quantitative parameters, including the signal-to-noise ratio (SNR) and contrast-to-noise ratio (CNR), and qualitative parameters, including noise, gray matter-white matter (GM-WM) differentiation, sharpness, artifact, acceptability, and unfamiliar texture change were evaluated and compared. Results The SNR and CNR of each level in each age group increased among strength levels of DLIR. High-level DLIR showed a significantly improved SNR and CNR (p < 0.05). Sequential reduction of noise, improvement of GM-WM differentiation, and improvement of sharpness was noted among strength levels of DLIR. Those of high-level DLIR showed a similar value as that with ASiR-V. Artifact and acceptability did not show a significant difference among the adapted levels of DLIR. Conclusion Adaptation of high-level DLIR for the pediatric head CT can significantly reduce image noise. Modification is needed while processing artifacts.

Enhancement of durability of tall buildings by using deep-learning-based predictions of wind-induced pressure

  • K.R. Sri Preethaa;N. Yuvaraj;Gitanjali Wadhwa;Sujeen Song;Se-Woon Choi;Bubryur Kim
    • Wind and Structures
    • /
    • v.36 no.4
    • /
    • pp.237-247
    • /
    • 2023
  • The emergence of high-rise buildings has necessitated frequent structural health monitoring and maintenance for safety reasons. Wind causes damage and structural changes on tall structures; thus, safe structures should be designed. The pressure developed on tall buildings has been utilized in previous research studies to assess the impacts of wind on structures. The wind tunnel test is a primary research method commonly used to quantify the aerodynamic characteristics of high-rise buildings. Wind pressure is measured by placing pressure sensor taps at different locations on tall buildings, and the collected data are used for analysis. However, sensors may malfunction and produce erroneous data; these data losses make it difficult to analyze aerodynamic properties. Therefore, it is essential to generate missing data relative to the original data obtained from neighboring pressure sensor taps at various intervals. This study proposes a deep learning-based, deep convolutional generative adversarial network (DCGAN) to restore missing data associated with faulty pressure sensors installed on high-rise buildings. The performance of the proposed DCGAN is validated by using a standard imputation model known as the generative adversarial imputation network (GAIN). The average mean-square error (AMSE) and average R-squared (ARSE) are used as performance metrics. The calculated ARSE values by DCGAN on the building model's front, backside, left, and right sides are 0.970, 0.972, 0.984 and 0.978, respectively. The AMSE produced by DCGAN on four sides of the building model is 0.008, 0.010, 0.015 and 0.014. The average standard deviation of the actual measures of the pressure sensors on four sides of the model were 0.1738, 0.1758, 0.2234 and 0.2278. The average standard deviation of the pressure values generated by the proposed DCGAN imputation model was closer to that of the measured actual with values of 0.1736,0.1746,0.2191, and 0.2239 on four sides, respectively. In comparison, the standard deviation of the values predicted by GAIN are 0.1726,0.1735,0.2161, and 0.2209, which is far from actual values. The results demonstrate that DCGAN model fits better for data imputation than the GAIN model with improved accuracy and fewer error rates. Additionally, the DCGAN is utilized to estimate the wind pressure in regions of buildings where no pressure sensor taps are available; the model yielded greater prediction accuracy than GAIN.

Image Enhancement based on Piece-wise Linear Enhancement Curves for Improved Visibility under Sunlight (햇빛 아래에서 향상된 시인성을 위한 Piece-wise Linear Enhancement Curves 기반 영상 개선)

  • Lee, Junmin;Song, Byung Cheol
    • Journal of Broadcast Engineering
    • /
    • v.27 no.5
    • /
    • pp.812-815
    • /
    • 2022
  • Images displayed on a digital devices under the sunlight are generally perceived to be darker than the original images, which leads to a decrease in visibility. For better visibility, global luminance compensation or tone mapping adaptive to ambient lighting is required. However, the existing methods have limitations in chrominance compensation and are difficult to use in real world due to their heavy computational cost. To solve these problems, this paper propose a piece-wise linear curves (PLECs)-based image enhancement method to improve both luminance and chrominance. At this time, PLECs are regressed through deep learning and implemented in the form of a lookup table to real-time operation. Experimental results show that the proposed method has better visibility compared to the original image with low computational cost.