• Title/Summary/Keyword: 컨볼루션 신경망

Search Result 157, Processing Time 0.023 seconds

Semantic Object Segmentation Using Conditional Generative Adversarial Network with Residual Connections (잔차 연결의 조건부 생성적 적대 신경망을 사용한 시맨틱 객체 분할)

  • Ibrahem, Hatem;Salem, Ahmed;Yagoub, Bilel;Kang, Hyun Su;Suh, Jae-Won
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.12
    • /
    • pp.1919-1925
    • /
    • 2022
  • In this paper, we propose an image-to-image translation approach based on the conditional generative adversarial network for semantic segmentation. Semantic segmentation is the task of clustering parts of an image together which belong to the same object class. Unlike the traditional pixel-wise classification approach, the proposed method parses an input RGB image to its corresponding semantic segmentation mask using a pixel regression approach. The proposed method is based on the Pix2Pix image synthesis method. We employ residual connections-based convolutional neural network architectures for both the generator and discriminator architectures, as the residual connections speed up the training process and generate more accurate results. The proposed method has been trained and tested on the NYU-depthV2 dataset and could achieve a good mIOU value (49.5%). We also compare the proposed approach to the current methods in semantic segmentation showing that the proposed method outperforms most of those methods.

Efficient 2D Smoke Synthesis with Cartesian Coordinates System Based Node Compression (데카르트 좌표계 기반 노드 압축을 이용한 효율적인 2차원 연기 합성)

  • Kim, Donghui;Kim, Jong-Hyun
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2021.07a
    • /
    • pp.659-660
    • /
    • 2021
  • 본 논문에서는 데카르트 좌표계 기반으로 노드를 압축함으로써 SR(Super-resolution) 기반 연기 합성을 효율적으로 처리할 수 있는 방법을 제안한다. 제안하는 방법은 다운 스케일링과 이진화를 통하여 연기 시뮬레이션의 계산 공간을 효율적으로 줄이고, 데카르트 좌표계 축을 기준으로 쿼드트리의 말단 노드를 압축함으로써 네트워크의 입력으로 전달하는 데이터 개수를 줄인다. 학습에 사용된 데이터는 COCO 2017 데이터셋이며, 인공신경망은 VGG19 기반 네트워크를 사용한다. 컨볼루션 계층을 거칠 때 데이터의 손실을 막기 위해 잔차(Residual)방식과 유사하게 이전 계층의 출력 값을 더해주며 학습한다. 결과적으로 제안하는 방법은 이전 결과에 비해 네트워크로 전달해야 하는 데이터가 압축되어 개수가 줄어드는 결과를 얻었으며, 그로 인해 네트워크 단계에서 필요한 I/O 과정을 효율적으로 처리할 수 있게 되었다.

  • PDF

Efficient Super-Resolution of 2D Smoke Data with Optimized Quadtree (최적화된 쿼드트리를 이용한 2차원 연기 데이터의 효율적인 슈퍼 해상도 기법)

  • Choe, YooYeon;Kim, Donghui;Kim, Jong-Hyun
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2021.01a
    • /
    • pp.261-264
    • /
    • 2021
  • 본 논문에서는 SR(Super-Resolution)을 계산하는데 필요한 데이터를 효율적으로 분류하고 분할하여 빠르게 SR연산을 가능하게 하는 쿼드트리 기반 최적화 기법을 제안한다. 제안하는 방법은 입력 데이터로 사용하는 연기 데이터를 다운스케일링(Downscaling)하여 쿼드트리 연산 소요 시간을 감소시키며, 이때 연기의 밀도를 이진화함으로써, 다운스케일링 과정에서 밀도가 손실되는 문제를 피한다. 학습에 사용된 데이터는 COCO 2017 Dataset이며, 인공신경망은 VGG19 기반 네트워크를 사용한다. 컨볼루션 계층을 거칠 때 데이터의 손실을 막기 위해 잔차(Residual)방식과 유사하게 이전 계층의 출력 값을 더해주며 학습한다. 결과적으로 제안하는 방법은 이전 결과 기법에 비해 약15~18배 정도의 속도향상을 얻었다.

  • PDF

A New Residual Attention Network based on Attention Models for Human Action Recognition in Video

  • Kim, Jee-Hyun;Cho, Young-Im
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.1
    • /
    • pp.55-61
    • /
    • 2020
  • With the development of deep learning technology and advances in computing power, video-based research is now gaining more and more attention. Video data contains a large amount of temporal and spatial information, which is the biggest difference compared with image data. It has a larger amount of data. It has attracted intense attention in computer vision. Among them, motion recognition is one of the research focuses. However, the action recognition of human in the video is extremely complex and challenging subject. Based on many research in human beings, we have found that artificial intelligence-like attention mechanisms are an efficient model for cognition. This efficient model is ideal for processing image information and complex continuous video information. We introduce this attention mechanism into video action recognition, paying attention to human actions in video and effectively improving recognition efficiency. In this paper, we propose a new 3D residual attention network using convolutional neural network based on two attention models to identify human action behavior in the video. An evaluation result of our model showed up to 90.7% accuracy.

Individual Pig Detection Using Kinect Depth Information and Convolutional Neural Network (키넥트 깊이 정보와 컨볼루션 신경망을 이용한 개별 돼지의 탐지)

  • Lee, Junhee;Lee, Jonguk;Park, Daihee;Chung, Yongwha
    • The Journal of the Korea Contents Association
    • /
    • v.18 no.2
    • /
    • pp.1-10
    • /
    • 2018
  • Aggression among pigs adversely affects economic returns and animal welfare in intensive pigsties. Recently, some studies have applied information technology to a livestock management system to minimize the damage resulting from such anomalies. Nonetheless, detecting each pig in a crowed pigsty is still challenging problem. In this paper, we propose a new Kinect camera and deep learning-based monitoring system for the detection of the individual pigs. The proposed system is characterized as follows. 1) The background subtraction method and depth-threshold are used to detect only standing-pigs in the Kinect-depth image. 2) The standing-pigs are detected by using YOLO (You Only Look Once) which is the fastest and most accurate model in deep learning algorithms. Our experimental results show that this method is effective for detecting individual pigs in real time in terms of both cost-effectiveness (using a low-cost Kinect depth sensor) and accuracy (average 99.40% detection accuracies).

A Safety Score Prediction Model in Urban Environment Using Convolutional Neural Network (컨볼루션 신경망을 이용한 도시 환경에서의 안전도 점수 예측 모델 연구)

  • Kang, Hyeon-Woo;Kang, Hang-Bong
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.8
    • /
    • pp.393-400
    • /
    • 2016
  • Recently, there have been various researches on efficient and automatic analysis on urban environment methods that utilize the computer vision and machine learning technology. Among many new analyses, urban safety analysis has received a major attention. In order to predict more accurately on safety score and reflect the human visual perception, it is necessary to consider the generic and local information that are most important to human perception. In this paper, we use Double-column Convolutional Neural network consisting of generic and local columns for the prediction of urban safety. The input of generic and local column used re-sized and random cropped images from original images, respectively. In addition, a new learning method is proposed to solve the problem of over-fitting in a particular column in the learning process. For the performance comparison of our Double-column Convolutional Neural Network, we compare two Support Vector Regression and three Convolutional Neural Network models using Root Mean Square Error and correlation analysis. Our experimental results demonstrate that our Double-column Convolutional Neural Network model show the best performance with Root Mean Square Error of 0.7432 and Pearson/Spearman correlation coefficient of 0.853/0.840.

Deep Learning Based Tree Recognition rate improving Method for Elementary and Middle School Learning

  • Choi, Jung-Eun;Yong, Hwan-Seung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.24 no.12
    • /
    • pp.9-16
    • /
    • 2019
  • The goal of this study is to propose an efficient model for recognizing and classifying tree images to measure the accuracy that can be applied to smart devices during class. From the 2009 revised textbook to the 2015 revised textbook, the learning objective to the fourth-grade science textbook of elementary schools was added to the plant recognition utilizing smart devices. In this study, we compared the recognition rates of trees before and after retraining using a pre-trained inception V3 model, which is the support of the Google Inception V3. In terms of tree recognition, it can distinguish several features, including shapes, bark, leaves, flowers, and fruits that may lead to the recognition rate. Furthermore, if all the leaves of trees may fall during winter, it may challenge to identify the type of tree, as only the bark of the tree will remain some leaves. Therefore, the effective tree classification model is presented through the combination of the images by tree type and the method of combining the model for the accuracy of each tree type. I hope that this model will apply to smart devices used in educational settings.

Application of the artificial intelligence for automatic detection of shipping noise in shallow-water (천해역 선박 소음 자동 탐지를 위한 인공지능 기법 적용)

  • Kim, Sunhyo;Jung, Seom-Kyu;Kang, Donhyug;Kim, Mira;Cho, Sungho
    • The Journal of the Acoustical Society of Korea
    • /
    • v.39 no.4
    • /
    • pp.279-285
    • /
    • 2020
  • The study on the temporal and spatial monitoring of passing vessels is important in terms of protection and management the marine ecosystem in the coastal area. In this paper, we propose the automatic detection technique of passing vessel by utilizing an artificial intelligence technology and broadband striation patterns which are characteristic of broadband noise radiated by passing vessel. Acoustic measurements to collect underwater noise spectrum images and ship navigation information were conducted in the southern region of Jeju Island in South Korea for 12 days (2016.07.15-07.26). And the convolution neural network model is optimized through learning and validation processes based on the collected images. The automatic detection performance of passing vessel is evaluated by precision (0.936), recall (0.830), average precision (0.824), and accuracy (0.949). In conclusion, the possibility of the automatic detection technique of passing vessel is confirmed by using an artificial intelligence technology, and a future study is proposed from the results of this study.

Depth Map Estimation Model Using 3D Feature Volume (3차원 특징볼륨을 이용한 깊이영상 생성 모델)

  • Shin, Soo-Yeon;Kim, Dong-Myung;Suh, Jae-Won
    • The Journal of the Korea Contents Association
    • /
    • v.18 no.11
    • /
    • pp.447-454
    • /
    • 2018
  • This paper proposes a depth image generation algorithm of stereo images using a deep learning model composed of a CNN (convolutional neural network). The proposed algorithm consists of a feature extraction unit which extracts the main features of each parallax image and a depth learning unit which learns the parallax information using extracted features. First, the feature extraction unit extracts a feature map for each parallax image through the Xception module and the ASPP(Atrous spatial pyramid pooling) module, which are composed of 2D CNN layers. Then, the feature map for each parallax is accumulated in 3D form according to the time difference and the depth image is estimated after passing through the depth learning unit for learning the depth estimation weight through 3D CNN. The proposed algorithm estimates the depth of object region more accurately than other algorithms.

Low Resolution Infrared Image Deep Convolution Neural Network for Embedded System

  • Hong, Yong-hee;Jin, Sang-hun;Kim, Dae-hyeon;Jhee, Ho-Jin
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.6
    • /
    • pp.1-8
    • /
    • 2021
  • In this paper, we propose reinforced VGG style network structure for low performance embedded system to classify low resolution infrared image. The combination of reinforced VGG style network structure and global average pooling makes lower computational complexity and higher accuracy. The proposed method classify the synthesize image which have 9 class 3,723,328ea images made from OKTAL-SE tool. The reinforced VGG style network structure composed of 4 filters on input and 16 filters on output from max pooling layer shows about 34% lower computational complexity and about 2.4% higher accuracy then the first parameter minimized network structure made for embedded system composed of 8 filters on input and 8 filters on output from max pooling layer. Finally we get 96.1% accuracy model. Additionally we confirmed the about 31% lower inference lead time in ported C code.