• Title/Summary/Keyword: ART2 Neural Network

Search Result 136, Processing Time 0.029 seconds

Semantic Segmentation of Clouds Using Multi-Branch Neural Architecture Search (멀티 브랜치 네트워크 구조 탐색을 사용한 구름 영역 분할)

  • Chi Yoon Jeong;Kyeong Deok Moon;Mooseop Kim
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.2
    • /
    • pp.143-156
    • /
    • 2023
  • To precisely and reliably analyze the contents of the satellite imagery, recognizing the clouds which are the obstacle to gathering the useful information is essential. In recent times, deep learning yielded satisfactory results in various tasks, so many studies using deep neural networks have been conducted to improve the performance of cloud detection. However, existing methods for cloud detection have the limitation on increasing the performance due to the adopting the network models for semantic image segmentation without modification. To tackle this problem, we introduced the multi-branch neural architecture search to find optimal network structure for cloud detection. Additionally, the proposed method adopts the soft intersection over union (IoU) as loss function to mitigate the disagreement between the loss function and the evaluation metric and uses the various data augmentation methods. The experiments are conducted using the cloud detection dataset acquired by Arirang-3/3A satellite imagery. The experimental results showed that the proposed network which are searched network architecture using cloud dataset is 4% higher than the existing network model which are searched network structure using urban street scenes with regard to the IoU. Also, the experimental results showed that the soft IoU exhibits the best performance on cloud detection among the various loss functions. When comparing the proposed method with the state-of-the-art (SOTA) models in the field of semantic segmentation, the proposed method showed better performance than the SOTA models with regard to the mean IoU and overall accuracy.

ORMN: A Deep Neural Network Model for Referring Expression Comprehension (ORMN: 참조 표현 이해를 위한 심층 신경망 모델)

  • Shin, Donghyeop;Kim, Incheol
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.7 no.2
    • /
    • pp.69-76
    • /
    • 2018
  • Referring expressions are natural language constructions used to identify particular objects within a scene. In this paper, we propose a new deep neural network model for referring expression comprehension. The proposed model finds out the region of the referred object in the given image by making use of the rich information about the referred object itself, the context object, and the relationship with the context object mentioned in the referring expression. In the proposed model, the object matching score and the relationship matching score are combined to compute the fitness score of each candidate region according to the structure of the referring expression sentence. Therefore, the proposed model consists of four different sub-networks: Language Representation Network(LRN), Object Matching Network (OMN), Relationship Matching Network(RMN), and Weighted Composition Network(WCN). We demonstrate that our model achieves state-of-the-art results for comprehension on three referring expression datasets.

Deep neural networks for speaker verification with short speech utterances (짧은 음성을 대상으로 하는 화자 확인을 위한 심층 신경망)

  • Yang, IL-Ho;Heo, Hee-Soo;Yoon, Sung-Hyun;Yu, Ha-Jin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.35 no.6
    • /
    • pp.501-509
    • /
    • 2016
  • We propose a method to improve the robustness of speaker verification on short test utterances. The accuracy of the state-of-the-art i-vector/probabilistic linear discriminant analysis systems can be degraded when testing utterance durations are short. The proposed method compensates for utterance variations of short test feature vectors using deep neural networks. We design three different types of DNN (Deep Neural Network) structures which are trained with different target output vectors. Each DNN is trained to minimize the discrepancy between the feed-forwarded output of a given short utterance feature and its original long utterance feature. We use short 2-10 s condition of the NIST (National Institute of Standards Technology, U.S.) 2008 SRE (Speaker Recognition Evaluation) corpus to evaluate the method. The experimental results show that the proposed method reduces the minimum detection cost relative to the baseline system.

A Car License Plate Recognition Using Colors Information, Morphological Characteristic and Neural Network (컬러 정보 및 형태학적 특징과 신경망을 이용한 차량 번호판 인식)

  • Cho, Jae-Hyun;Yang, Hwang-Kyu
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.5 no.3
    • /
    • pp.304-308
    • /
    • 2010
  • In this paper, we propose a new method of recognizing the vehicle license plate using color space, morphological characteristics and ART2 algorithm. Morphological characteristics of old and/or new style vehicle license plate among the candidate regions are applied to remove noise areas using 8-directional contour tracking algorithm, then follow by the extraction of vehicle plate. From the extracted license plate area, plate morphological characteristics of each region are removed. After that, labeling algorithm to extract the individual characters are then combined. The classified individual character and numeric codes are applied to the ART2 algorithm for the learning and recognition. In order to evaluate the performance of our proposed extraction and recognition of vehicle license method, we have run experiments on 100 green plates and white plates. Experimental results shown that the proposed license plate extraction and recognition method was effective.

DP-LinkNet: A convolutional network for historical document image binarization

  • Xiong, Wei;Jia, Xiuhong;Yang, Dichun;Ai, Meihui;Li, Lirong;Wang, Song
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.5
    • /
    • pp.1778-1797
    • /
    • 2021
  • Document image binarization is an important pre-processing step in document analysis and archiving. The state-of-the-art models for document image binarization are variants of encoder-decoder architectures, such as FCN (fully convolutional network) and U-Net. Despite their success, they still suffer from three limitations: (1) reduced feature map resolution due to consecutive strided pooling or convolutions, (2) multiple scales of target objects, and (3) reduced localization accuracy due to the built-in invariance of deep convolutional neural networks (DCNNs). To overcome these three challenges, we propose an improved semantic segmentation model, referred to as DP-LinkNet, which adopts the D-LinkNet architecture as its backbone, with the proposed hybrid dilated convolution (HDC) and spatial pyramid pooling (SPP) modules between the encoder and the decoder. Extensive experiments are conducted on recent document image binarization competition (DIBCO) and handwritten document image binarization competition (H-DIBCO) benchmark datasets. Results show that our proposed DP-LinkNet outperforms other state-of-the-art techniques by a large margin. Our implementation and the pre-trained models are available at https://github.com/beargolden/DP-LinkNet.

Attention Deep Neural Networks Learning based on Multiple Loss functions for Video Face Recognition (비디오 얼굴인식을 위한 다중 손실 함수 기반 어텐션 심층신경망 학습 제안)

  • Kim, Kyeong Tae;You, Wonsang;Choi, Jae Young
    • Journal of Korea Multimedia Society
    • /
    • v.24 no.10
    • /
    • pp.1380-1390
    • /
    • 2021
  • The video face recognition (FR) is one of the most popular researches in the field of computer vision due to a variety of applications. In particular, research using the attention mechanism is being actively conducted. In video face recognition, attention represents where to focus on by using the input value of the whole or a specific region, or which frame to focus on when there are many frames. In this paper, we propose a novel attention based deep learning method. Main novelties of our method are (1) the use of combining two loss functions, namely weighted Softmax loss function and a Triplet loss function and (2) the feasibility of end-to-end learning which includes the feature embedding network and attention weight computation. The feature embedding network has a positive effect on the attention weight computation by using combined loss function and end-to-end learning. To demonstrate the effectiveness of our proposed method, extensive and comparative experiments have been carried out to evaluate our method on IJB-A dataset with their standard evaluation protocols. Our proposed method represented better or comparable recognition rate compared to other state-of-the-art video FR methods.

Reducing latency of neural automatic piano transcription models (인공신경망 기반 저지연 피아노 채보 모델)

  • Dasol Lee;Dasaem Jeong
    • The Journal of the Acoustical Society of Korea
    • /
    • v.42 no.2
    • /
    • pp.102-111
    • /
    • 2023
  • Automatic Music Transcription (AMT) is a task that detects and recognizes musical note events from a given audio recording. In this paper, we focus on reducing the latency of real-time AMT systems on piano music. Although neural AMT models have been adapted for real-time piano transcription, they suffer from high latency, which hinders their usefulness in interactive scenarios. To tackle this issue, we explore several techniques for reducing the intrinsic latency of a neural network for piano transcription, including reducing window and hop sizes of Fast Fourier Transformation (FFT), modifying convolutional layer's kernel size, and shifting the label in the time-axis to train the model to predict onset earlier. Our experiments demonstrate that combining these approaches can lower latency while maintaining high transcription accuracy. Specifically, our modified model achieved note F1 scores of 92.67 % and 90.51 % with latencies of 96 ms and 64 ms, respectively, compared to the baseline model's note F1 score of 93.43 % with a latency of 160 ms. This methodology has potential for training AMT models for various interactive scenarios, including providing real-time feedback for piano education.

Real-time Multiple Pedestrians Tracking for Embedded Smart Visual Systems

  • Nguyen, Van Ngoc Nghia;Nguyen, Thanh Binh;Chung, Sun-Tae
    • Journal of Korea Multimedia Society
    • /
    • v.22 no.2
    • /
    • pp.167-177
    • /
    • 2019
  • Even though so much progresses have been achieved in Multiple Object Tracking (MOT), most of reported MOT methods are not still satisfactory for commercial embedded products like Pan-Tilt-Zoom (PTZ) camera. In this paper, we propose a real-time multiple pedestrians tracking method for embedded environments. First, we design a new light weight convolutional neural network(CNN)-based pedestrian detector, which is constructed to detect even small size pedestrians, as well. For further saving of processing time, the designed detector is applied for every other frame, and Kalman filter is employed to predict pedestrians' positions in frames where the designed CNN-based detector is not applied. The pose orientation information is incorporated to enhance object association for tracking pedestrians without further computational cost. Through experiments on Nvidia's embedded computing board, Jetson TX2, it is verified that the designed pedestrian detector detects even small size pedestrians fast and well, compared to many state-of-the-art detectors, and that the proposed tracking method can track pedestrians in real-time and show accuracy performance comparably to performances of many state-of-the-art tracking methods, which do not target for operation in embedded systems.

Pavement Crack Detection and Segmentation Based on Deep Neural Network

  • Nguyen, Huy Toan;Yu, Gwang Hyun;Na, Seung You;Kim, Jin Young;Seo, Kyung Sik
    • The Journal of Korean Institute of Information Technology
    • /
    • v.17 no.9
    • /
    • pp.99-112
    • /
    • 2019
  • Cracks on pavement surfaces are critical signs and symptoms of the degradation of pavement structures. Image-based pavement crack detection is a challenging problem due to the intensity inhomogeneity, topology complexity, low contrast, and noisy texture background. In this paper, we address the problem of pavement crack detection and segmentation at pixel-level based on a Deep Neural Network (DNN) using gray-scale images. We propose a novel DNN architecture which contains a modified U-net network and a high-level features network. An important contribution of this work is the combination of these networks afforded through the fusion layer. To the best of our knowledge, this is the first paper introducing this combination for pavement crack segmentation and detection problem. The system performance of crack detection and segmentation is enhanced dramatically by using our novel architecture. We thoroughly implement and evaluate our proposed system on two open data sets: the Crack Forest Dataset (CFD) and the AigleRN dataset. Experimental results demonstrate that our system outperforms eight state-of-the-art methods on the same data sets.

Remote Sensing Image Classification for Land Cover Mapping in Developing Countries: A Novel Deep Learning Approach

  • Lynda, Nzurumike Obianuju;Nnanna, Nwojo Agwu;Boukar, Moussa Mahamat
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.2
    • /
    • pp.214-222
    • /
    • 2022
  • Convolutional Neural networks (CNNs) are a category of deep learning networks that have proven very effective in computer vision tasks such as image classification. Notwithstanding, not much has been seen in its use for remote sensing image classification in developing countries. This is majorly due to the scarcity of training data. Recently, transfer learning technique has successfully been used to develop state-of-the art models for remote sensing (RS) image classification tasks using training and testing data from well-known RS data repositories. However, the ability of such model to classify RS test data from a different dataset has not been sufficiently investigated. In this paper, we propose a deep CNN model that can classify RS test data from a dataset different from the training dataset. To achieve our objective, we first, re-trained a ResNet-50 model using EuroSAT, a large-scale RS dataset to develop a base model then we integrated Augmentation and Ensemble learning to improve its generalization ability. We further experimented on the ability of this model to classify a novel dataset (Nig_Images). The final classification results shows that our model achieves a 96% and 80% accuracy on EuroSAT and Nig_Images test data respectively. Adequate knowledge and usage of this framework is expected to encourage research and the usage of deep CNNs for land cover mapping in cases of lack of training data as obtainable in developing countries.