• 제목/요약/키워드: Image semantic segmentation

검색결과 145건 처리시간 0.02초

MEDU-Net+: a novel improved U-Net based on multi-scale encoder-decoder for medical image segmentation

  • Zhenzhen Yang;Xue Sun;Yongpeng, Yang;Xinyi Wu
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제18권7호
    • /
    • pp.1706-1725
    • /
    • 2024
  • The unique U-shaped structure of U-Net network makes it achieve good performance in image segmentation. This network is a lightweight network with a small number of parameters for small image segmentation datasets. However, when the medical image to be segmented contains a lot of detailed information, the segmentation results cannot fully meet the actual requirements. In order to achieve higher accuracy of medical image segmentation, a novel improved U-Net network architecture called multi-scale encoder-decoder U-Net+ (MEDU-Net+) is proposed in this paper. We design the GoogLeNet for achieving more information at the encoder of the proposed MEDU-Net+, and present the multi-scale feature extraction for fusing semantic information of different scales in the encoder and decoder. Meanwhile, we also introduce the layer-by-layer skip connection to connect the information of each layer, so that there is no need to encode the last layer and return the information. The proposed MEDU-Net+ divides the unknown depth network into each part of deconvolution layer to replace the direct connection of the encoder and decoder in U-Net. In addition, a new combined loss function is proposed to extract more edge information by combining the advantages of the generalized dice and the focal loss functions. Finally, we validate our proposed MEDU-Net+ MEDU-Net+ and other classic medical image segmentation networks on three medical image datasets. The experimental results show that our proposed MEDU-Net+ has prominent superior performance compared with other medical image segmentation networks.

DP-LinkNet: A convolutional network for historical document image binarization

  • Xiong, Wei;Jia, Xiuhong;Yang, Dichun;Ai, Meihui;Li, Lirong;Wang, Song
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제15권5호
    • /
    • pp.1778-1797
    • /
    • 2021
  • Document image binarization is an important pre-processing step in document analysis and archiving. The state-of-the-art models for document image binarization are variants of encoder-decoder architectures, such as FCN (fully convolutional network) and U-Net. Despite their success, they still suffer from three limitations: (1) reduced feature map resolution due to consecutive strided pooling or convolutions, (2) multiple scales of target objects, and (3) reduced localization accuracy due to the built-in invariance of deep convolutional neural networks (DCNNs). To overcome these three challenges, we propose an improved semantic segmentation model, referred to as DP-LinkNet, which adopts the D-LinkNet architecture as its backbone, with the proposed hybrid dilated convolution (HDC) and spatial pyramid pooling (SPP) modules between the encoder and the decoder. Extensive experiments are conducted on recent document image binarization competition (DIBCO) and handwritten document image binarization competition (H-DIBCO) benchmark datasets. Results show that our proposed DP-LinkNet outperforms other state-of-the-art techniques by a large margin. Our implementation and the pre-trained models are available at https://github.com/beargolden/DP-LinkNet.

Deep Facade Parsing with Occlusions

  • Ma, Wenguang;Ma, Wei;Xu, Shibiao
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제16권2호
    • /
    • pp.524-543
    • /
    • 2022
  • Correct facade image parsing is essential to the semantic understanding of outdoor scenes. Unfortunately, there are often various occlusions in front of buildings, which fails many existing methods. In this paper, we propose an end-to-end deep network for facade parsing with occlusions. The network learns to decompose an input image into visible and invisible parts by occlusion reasoning. Then, a context aggregation module is proposed to collect nonlocal cues for semantic segmentation of the visible part. In addition, considering the regularity of man-made buildings, a repetitive pattern completion branch is designed to infer the contents in the invisible regions by referring to the visible part. Finally, the parsing map of the input facade image is generated by fusing the results of the visible and invisible results. Experiments on both synthetic and real datasets demonstrate that the proposed method outperforms state-of-the-art methods in parsing facades with occlusions. Moreover, we applied our method in applications of image inpainting and 3D semantic modeling.

ETLi: Efficiently annotated traffic LiDAR dataset using incremental and suggestive annotation

  • Kang, Jungyu;Han, Seung-Jun;Kim, Nahyeon;Min, Kyoung-Wook
    • ETRI Journal
    • /
    • 제43권4호
    • /
    • pp.630-639
    • /
    • 2021
  • Autonomous driving requires a computerized perception of the environment for safety and machine-learning evaluation. Recognizing semantic information is difficult, as the objective is to instantly recognize and distinguish items in the environment. Training a model with real-time semantic capability and high reliability requires extensive and specialized datasets. However, generalized datasets are unavailable and are typically difficult to construct for specific tasks. Hence, a light detection and ranging semantic dataset suitable for semantic simultaneous localization and mapping and specialized for autonomous driving is proposed. This dataset is provided in a form that can be easily used by users familiar with existing two-dimensional image datasets, and it contains various weather and light conditions collected from a complex and diverse practical setting. An incremental and suggestive annotation routine is proposed to improve annotation efficiency. A model is trained to simultaneously predict segmentation labels and suggest class-representative frames. Experimental results demonstrate that the proposed algorithm yields a more efficient dataset than uniformly sampled datasets.

Comparison of Multi-Label U-Net and Mask R-CNN for panoramic radiograph segmentation to detect periodontitis

  • Rini, Widyaningrum;Ika, Candradewi;Nur Rahman Ahmad Seno, Aji;Rona, Aulianisa
    • Imaging Science in Dentistry
    • /
    • 제52권4호
    • /
    • pp.383-391
    • /
    • 2022
  • Purpose: Periodontitis, the most prevalent chronic inflammatory condition affecting teeth-supporting tissues, is diagnosed and classified through clinical and radiographic examinations. The staging of periodontitis using panoramic radiographs provides information for designing computer-assisted diagnostic systems. Performing image segmentation in periodontitis is required for image processing in diagnostic applications. This study evaluated image segmentation for periodontitis staging based on deep learning approaches. Materials and Methods: Multi-Label U-Net and Mask R-CNN models were compared for image segmentation to detect periodontitis using 100 digital panoramic radiographs. Normal conditions and 4 stages of periodontitis were annotated on these panoramic radiographs. A total of 1100 original and augmented images were then randomly divided into a training (75%) dataset to produce segmentation models and a testing (25%) dataset to determine the evaluation metrics of the segmentation models. Results: The performance of the segmentation models against the radiographic diagnosis of periodontitis conducted by a dentist was described by evaluation metrics(i.e., dice coefficient and intersection-over-union [IoU] score). MultiLabel U-Net achieved a dice coefficient of 0.96 and an IoU score of 0.97. Meanwhile, Mask R-CNN attained a dice coefficient of 0.87 and an IoU score of 0.74. U-Net showed the characteristic of semantic segmentation, and Mask R-CNN performed instance segmentation with accuracy, precision, recall, and F1-score values of 95%, 85.6%, 88.2%, and 86.6%, respectively. Conclusion: Multi-Label U-Net produced superior image segmentation to that of Mask R-CNN. The authors recommend integrating it with other techniques to develop hybrid models for automatic periodontitis detection.

사전위치정보를 이용한 도심 영상의 의미론적 분할 (Semantic Segmentation of Urban Scenes Using Location Prior Information)

  • 왕정현;김진환
    • 로봇학회논문지
    • /
    • 제12권3호
    • /
    • pp.249-257
    • /
    • 2017
  • This paper proposes a method to segment urban scenes semantically based on location prior information. Since major scene elements in urban environments such as roads, buildings, and vehicles are often located at specific locations, using the location prior information of these elements can improve the segmentation performance. The location priors are defined in special 2D coordinates, referred to as road-normal coordinates, which are perpendicular to the orientation of the road. With the help of depth information to each element, all the possible pixels in the image are projected into these coordinates and the learned prior information is applied to those pixels. The proposed location prior can be modeled by defining a unary potential of a conditional random field (CRF) as a sum of two sub-potentials: an appearance feature-based potential and a location potential. The proposed method was validated using publicly available KITTI dataset, which has urban images and corresponding 3D depth measurements.

관개용수로 CCTV 이미지를 이용한 CNN 딥러닝 이미지 모델 적용 (Application of CCTV Image and Semantic Segmentation Model for Water Level Estimation of Irrigation Channel)

  • 김귀훈;김마가;윤푸른;방재홍;명우호;최진용;최규훈
    • 한국농공학회논문집
    • /
    • 제64권3호
    • /
    • pp.63-73
    • /
    • 2022
  • A more accurate understanding of the irrigation water supply is necessary for efficient agricultural water management. Although we measure water levels in an irrigation canal using ultrasonic water level gauges, some errors occur due to malfunctions or the surrounding environment. This study aims to apply CNN (Convolutional Neural Network) Deep-learning-based image classification and segmentation models to the irrigation canal's CCTV (Closed-Circuit Television) images. The CCTV images were acquired from the irrigation canal of the agricultural reservoir in Cheorwon-gun, Gangwon-do. We used the ResNet-50 model for the image classification model and the U-Net model for the image segmentation model. Using the Natural Breaks algorithm, we divided water level data into 2, 4, and 8 groups for image classification models. The classification models of 2, 4, and 8 groups showed the accuracy of 1.000, 0.987, and 0.634, respectively. The image segmentation model showed a Dice score of 0.998 and predicted water levels showed R2 of 0.97 and MAE (Mean Absolute Error) of 0.02 m. The image classification models can be applied to the automatic gate-controller at four divisions of water levels. Also, the image segmentation model results can be applied to the alternative measurement for ultrasonic water gauges. We expect that the results of this study can provide a more scientific and efficient approach for agricultural water management.

ATLAS V2.0 데이터에서 의료영상 분할 모델 성능 비교 (Comparison of Performance of Medical Image Semantic Segmentation Model in ATLASV2.0 Data)

  • 우소연;구영현;유성준
    • 방송공학회논문지
    • /
    • 제28권3호
    • /
    • pp.267-274
    • /
    • 2023
  • 의료영상 공개 데이터는 수집에 한계가 있어 데이터셋의 양이 부족하다는 문제점이 있다. 때문에 기존 연구들은 공개 데이터셋에 과적합 되었을 우려가 있다. 본 논문은 실험을 통해 8개의 (Unet, X-Net, HarDNet, SegNet, PSPNet, SwinUnet, 3D-ResU-Net, UNETR) 의료영상 분할 모델의 성능을 비교함으로써 기존 모델의 성능을 재검증하고자 한다. 뇌졸중 진단 공개 데이터 셋인 Anatomical Tracings of Lesions After Stroke(ATLAS) V1.2과 ATLAS V2.0에서 모델들의 성능 비교 실험을 진행한다. 실험결과 대부분 모델은 V1.2과 V2.0에서 성능이 비슷한 결과를 보였다. 하지만 X-net과 3D-ResU-Net는 V1.2 데이터셋에서 더 높은 성능을 기록했다. 이러한 결과는 해당 모델들이 V1.2에 과적합 되었을 것으로 해석할 수 있다.

Semantic crack-image identification framework for steel structures using atrous convolution-based Deeplabv3+ Network

  • Ta, Quoc-Bao;Dang, Ngoc-Loi;Kim, Yoon-Chul;Kam, Hyeon-Dong;Kim, Jeong-Tae
    • Smart Structures and Systems
    • /
    • 제30권1호
    • /
    • pp.17-34
    • /
    • 2022
  • For steel structures, fatigue cracks are critical damage induced by long-term cycle loading and distortion effects. Vision-based crack detection can be a solution to ensure structural integrity and performance by continuous monitoring and non-destructive assessment. A critical issue is to distinguish cracks from other features in captured images which possibly consist of complex backgrounds such as handwritings and marks, which were made to record crack patterns and lengths during periodic visual inspections. This study presents a parametric study on image-based crack identification for orthotropic steel bridge decks using captured images with complicated backgrounds. Firstly, a framework for vision-based crack segmentation using the atrous convolution-based Deeplapv3+ network (ACDN) is designed. Secondly, features on crack images are labeled to build three databanks by consideration of objects in the backgrounds. Thirdly, evaluation metrics computed from the trained ACDN models are utilized to evaluate the effects of obstacles on crack detection results. Finally, various training parameters, including image sizes, hyper-parameters, and the number of training images, are optimized for the ACDN model of crack detection. The result demonstrated that fatigue cracks could be identified by the trained ACDN models, and the accuracy of the crack-detection result was improved by optimizing the training parameters. It enables the applicability of the vision-based technique for early detecting tiny fatigue cracks in steel structures.

Deep learning approach to generate 3D civil infrastructure models using drone images

  • Kwon, Ji-Hye;Khudoyarov, Shekhroz;Kim, Namgyu;Heo, Jun-Haeng
    • Smart Structures and Systems
    • /
    • 제30권5호
    • /
    • pp.501-511
    • /
    • 2022
  • Three-dimensional (3D) models have become crucial for improving civil infrastructure analysis, and they can be used for various purposes such as damage detection, risk estimation, resolving potential safety issues, alarm detection, and structural health monitoring. 3D point cloud data is used not only to make visual models but also to analyze the states of structures and to monitor them using semantic data. This study proposes automating the generation of high-quality 3D point cloud data and removing noise using deep learning algorithms. In this study, large-format aerial images of civilian infrastructure, such as cut slopes and dams, which were captured by drones, were used to develop a workflow for automatically generating a 3D point cloud model. Through image cropping, downscaling/upscaling, semantic segmentation, generation of segmentation masks, and implementation of region extraction algorithms, the generation of the point cloud was automated. Compared with the method wherein the point cloud model is generated from raw images, our method could effectively improve the quality of the model, remove noise, and reduce the processing time. The results showed that the size of the 3D point cloud model created using the proposed method was significantly reduced; the number of points was reduced by 20-50%, and distant points were recognized as noise. This method can be applied to the automatic generation of high-quality 3D point cloud models of civil infrastructures using aerial imagery.