• Title/Summary/Keyword: VGG-16

Search Result 117, Processing Time 0.028 seconds

A Comparison of Image Classification System for Building Waste Data based on Deep Learning (딥러닝기반 건축폐기물 이미지 분류 시스템 비교)

  • Jae-Kyung Sung;Mincheol Yang;Kyungnam Moon;Yong-Guk Kim
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.23 no.3
    • /
    • pp.199-206
    • /
    • 2023
  • This study utilizes deep learning algorithms to automatically classify construction waste into three categories: wood waste, plastic waste, and concrete waste. Two models, VGG-16 and ViT (Vision Transformer), which are convolutional neural network image classification algorithms and NLP-based models that sequence images, respectively, were compared for their performance in classifying construction waste. Image data for construction waste was collected by crawling images from search engines worldwide, and 3,000 images, with 1,000 images for each category, were obtained by excluding images that were difficult to distinguish with the naked eye or that were duplicated and would interfere with the experiment. In addition, to improve the accuracy of the models, data augmentation was performed during training with a total of 30,000 images. Despite the unstructured nature of the collected image data, the experimental results showed that VGG-16 achieved an accuracy of 91.5%, and ViT achieved an accuracy of 92.7%. This seems to suggest the possibility of practical application in actual construction waste data management work. If object detection techniques or semantic segmentation techniques are utilized based on this study, more precise classification will be possible even within a single image, resulting in more accurate waste classification

Fight Detection in Hockey Videos using Deep Network

  • Mukherjee, Subham;Saini, Rajkumar;Kumar, Pradeep;Roy, Partha Pratim;Dogra, Debi Prosad;Kim, Byung-Gyu
    • Journal of Multimedia Information System
    • /
    • v.4 no.4
    • /
    • pp.225-232
    • /
    • 2017
  • Understanding actions in videos is an important task. It helps in finding the anomalies present in videos such as fights. Detection of fights becomes more crucial when it comes to sports. This paper focuses on finding fight scenes in Hockey sport videos using blur & radon transform and convolutional neural networks (CNNs). First, the local motion within the video frames has been extracted using blur information. Next, fast fourier and radon transform have been applied on the local motion. The video frames with fight scene have been identified using transfer learning with the help of pre-trained deep learning model VGG-Net. Finally, a comparison of the methodology has been performed using feed forward neural networks. Accuracies of 56.00% and 75.00% have been achieved using feed forward neural network and VGG16-Net, respectively.

A Deep Learning-Based Rate Control for HEVC Intra Coding

  • Marzuki, Ismail;Sim, Donggyu
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2019.11a
    • /
    • pp.180-181
    • /
    • 2019
  • This paper proposes a rate control algorithm for intra coding frame in HEVC encoder using a deep learning approach. The proposed algorithm is designed for CTU level bit allocation in intra frame by considering visual features spatially and temporally. Our features are generated using visual geometry group (VGG-16) with deep convolutional layers, then it is used for bit allocation per each CTU within an intra frame. According to our experiments, the proposed algorithm can achieve -2.04% Luma component BD-rate gain with minimal bit accuracy loss against the HM-16.20 rate control model.

  • PDF

Early Detection of Rice Leaf Blast Disease using Deep-Learning Techniques

  • Syed Rehan Shah;Syed Muhammad Waqas Shah;Hadia Bibi;Mirza Murad Baig
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.4
    • /
    • pp.211-221
    • /
    • 2024
  • Pakistan is a top producer and exporter of high-quality rice, but traditional methods are still being used for detecting rice diseases. This research project developed an automated rice blast disease diagnosis technique based on deep learning, image processing, and transfer learning with pre-trained models such as Inception V3, VGG16, VGG19, and ResNet50. The modified connection skipping ResNet 50 had the highest accuracy of 99.16%, while the other models achieved 98.16%, 98.47%, and 98.56%, respectively. In addition, CNN and an ensemble model K-nearest neighbor were explored for disease prediction, and the study demonstrated superior performance and disease prediction using recommended web-app approaches.

Performance comparison of wake-up-word detection on mobile devices using various convolutional neural networks (다양한 합성곱 신경망 방식을 이용한 모바일 기기를 위한 시작 단어 검출의 성능 비교)

  • Kim, Sanghong;Lee, Bowon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.39 no.5
    • /
    • pp.454-460
    • /
    • 2020
  • Artificial intelligence assistants that provide speech recognition operate through cloud-based voice recognition with high accuracy. In cloud-based speech recognition, Wake-Up-Word (WUW) detection plays an important role in activating devices on standby. In this paper, we compare the performance of Convolutional Neural Network (CNN)-based WUW detection models for mobile devices by using Google's speech commands dataset, using the spectrogram and mel-frequency cepstral coefficient features as inputs. The CNN models used in this paper are multi-layer perceptron, general convolutional neural network, VGG16, VGG19, ResNet50, ResNet101, ResNet152, MobileNet. We also propose network that reduces the model size to 1/25 while maintaining the performance of MobileNet is also proposed.

Comparison of Deep Learning-based CNN Models for Crack Detection (콘크리트 균열 탐지를 위한 딥 러닝 기반 CNN 모델 비교)

  • Seol, Dong-Hyeon;Oh, Ji-Hoon;Kim, Hong-Jin
    • Journal of the Architectural Institute of Korea Structure & Construction
    • /
    • v.36 no.3
    • /
    • pp.113-120
    • /
    • 2020
  • The purpose of this study is to compare the models of Deep Learning-based Convolution Neural Network(CNN) for concrete crack detection. The comparison models are AlexNet, GoogLeNet, VGG16, VGG19, ResNet-18, ResNet-50, ResNet-101, and SqueezeNet which won ImageNet Large Scale Visual Recognition Challenge(ILSVRC). To train, validate and test these models, we constructed 3000 training data and 12000 validation data with 256×256 pixel resolution consisting of cracked and non-cracked images, and constructed 5 test data with 4160×3120 pixel resolution consisting of concrete images with crack. In order to increase the efficiency of the training, transfer learning was performed by taking the weight from the pre-trained network supported by MATLAB. From the trained network, the validation data is classified into crack image and non-crack image, yielding True Positive (TP), True Negative (TN), False Positive (FP), False Negative (FN), and 6 performance indicators, False Negative Rate (FNR), False Positive Rate (FPR), Error Rate, Recall, Precision, Accuracy were calculated. The test image was scanned twice with a sliding window of 256×256 pixel resolution to classify the cracks, resulting in a crack map. From the comparison of the performance indicators and the crack map, it was concluded that VGG16 and VGG19 were the most suitable for detecting concrete cracks.

Detection of Number and Character Area of License Plate Using Deep Learning and Semantic Image Segmentation (딥러닝과 의미론적 영상분할을 이용한 자동차 번호판의 숫자 및 문자영역 검출)

  • Lee, Jeong-Hwan
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.1
    • /
    • pp.29-35
    • /
    • 2021
  • License plate recognition plays a key role in intelligent transportation systems. Therefore, it is a very important process to efficiently detect the number and character areas. In this paper, we propose a method to effectively detect license plate number area by applying deep learning and semantic image segmentation algorithm. The proposed method is an algorithm that detects number and text areas directly from the license plate without preprocessing such as pixel projection. The license plate image was acquired from a fixed camera installed on the road, and was used in various real situations taking into account both weather and lighting changes. The input images was normalized to reduce the color change, and the deep learning neural networks used in the experiment were Vgg16, Vgg19, ResNet18, and ResNet50. To examine the performance of the proposed method, we experimented with 500 license plate images. 300 sheets were used for learning and 200 sheets were used for testing. As a result of computer simulation, it was the best when using ResNet50, and 95.77% accuracy was obtained.

Iceberg-Ship Classification in SAR Images Using Convolutional Neural Network with Transfer Learning

  • Choi, Jeongwhan
    • Journal of Internet Computing and Services
    • /
    • v.19 no.4
    • /
    • pp.35-44
    • /
    • 2018
  • Monitoring through Synthesis Aperture Radar (SAR) is responsible for marine safety from floating icebergs. However, there are limits to distinguishing between icebergs and ships in SAR images. Convolutional Neural Network (CNN) is used to distinguish the iceberg from the ship. The goal of this paper is to increase the accuracy of identifying icebergs from SAR images. The metrics for performance evaluation uses the log loss. The two-layer CNN model proposed in research of C.Bentes et al.[1] is used as a benchmark model and compared with the four-layer CNN model using data augmentation. Finally, the performance of the final CNN model using the VGG-16 pre-trained model is compared with the previous model. This paper shows how to improve the benchmark model and propose the final CNN model.

A Deep Learning-Based Image Semantic Segmentation Algorithm

  • Chaoqun, Shen;Zhongliang, Sun
    • Journal of Information Processing Systems
    • /
    • v.19 no.1
    • /
    • pp.98-108
    • /
    • 2023
  • This paper is an attempt to design segmentation method based on fully convolutional networks (FCN) and attention mechanism. The first five layers of the Visual Geometry Group (VGG) 16 network serve as the coding part in the semantic segmentation network structure with the convolutional layer used to replace pooling to reduce loss of image feature extraction information. The up-sampling and deconvolution unit of the FCN is then used as the decoding part in the semantic segmentation network. In the deconvolution process, the skip structure is used to fuse different levels of information and the attention mechanism is incorporated to reduce accuracy loss. Finally, the segmentation results are obtained through pixel layer classification. The results show that our method outperforms the comparison methods in mean pixel accuracy (MPA) and mean intersection over union (MIOU).

Image Recognition System for Early Detection of Oral Cancer (구강암 조기발견을 위한 영상인식 시스템)

  • Cahyadi, Edward Dwijayanto;Song, Mi-Hwa
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2022.05a
    • /
    • pp.309-311
    • /
    • 2022
  • Oral cancer is a type of cancer that has a high possibility to be cured if it is threatened earlier. The convolutional neural network is very popular for being a good algorithm for image recognition. In this research, we try to compare 4 different architectures of the CNN algorithm: Convnet, VGG16, Inception V3, and Resnet. As we compared those 4 architectures we found that VGG16 and Resnet model has better performance with an 85.35% accuracy rate compared to the other 3 architectures. In the future, we are sure that image recognition can be more developed to identify oral cancer earlier.