• Title/Summary/Keyword: pooling layer

Search Result 52, Processing Time 0.024 seconds

Revisiting Deep Learning Model for Image Quality Assessment: Is Strided Convolution Better than Pooling? (영상 화질 평가 딥러닝 모델 재검토: 스트라이드 컨볼루션이 풀링보다 좋은가?)

  • Uddin, AFM Shahab;Chung, TaeChoong;Bae, Sung-Ho
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2020.11a
    • /
    • pp.29-32
    • /
    • 2020
  • Due to the lack of improper image acquisition process, noise induction is an inevitable step. As a result, objective image quality assessment (IQA) plays an important role in estimating the visual quality of noisy image. Plenty of IQA methods have been proposed including traditional signal processing based methods as well as current deep learning based methods where the later one shows promising performance due to their complex representation ability. The deep learning based methods consists of several convolution layers and down sampling layers for feature extraction and fully connected layers for regression. Usually, the down sampling is performed by using max-pooling layer after each convolutional block. We reveal that this max-pooling causes information loss despite of knowing their importance. Consequently, we propose a better IQA method that replaces the max-pooling layers with strided convolutions to down sample the feature space and since the strided convolution layers have learnable parameters, they preserve optimal features and discard redundant information, thereby improve the prediction accuracy. The experimental results verify the effectiveness of the proposed method.

  • PDF

Robust Deep Learning-Based Profiling Side-Channel Analysis for Jitter (지터에 강건한 딥러닝 기반 프로파일링 부채널 분석 방안)

  • Kim, Ju-Hwan;Woo, Ji-Eun;Park, So-Yeon;Kim, Soo-Jin;Han, Dong-Guk
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.30 no.6
    • /
    • pp.1271-1278
    • /
    • 2020
  • Deep learning-based profiling side-channel analysis is a powerful analysis method that utilizes the neural network to profile the relationship between the side-channel information and the intermediate value. Since the neural network interprets each point of the signal in a different dimension, jitter makes it much hard that the neural network with dimension-wise weights learns the relationship. This paper shows that replacing the fully-connected layer of the traditional CNN (Convolutional Neural Network) with global average pooling (GAP) allows us to design the inherently robust neural network inherently for jitter. We experimented with the ChipWhisperer-Lite board to demonstrate the proposed method: as a result, the validation accuracy of the CNN with a fully-connected layer was only up to 1.4%; contrastively, the validation accuracy of the CNN with GAP was very high at up to 41.7%.

Efficient Thread Allocation Method of Convolutional Neural Network based on GPGPU (GPGPU 기반 Convolutional Neural Network의 효율적인 스레드 할당 기법)

  • Kim, Mincheol;Lee, Kwangyeob
    • Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology
    • /
    • v.7 no.10
    • /
    • pp.935-943
    • /
    • 2017
  • CNN (Convolution neural network), which is used for image classification and speech recognition among neural networks learning based on positive data, has been continuously developed to have a high performance structure to date. There are many difficulties to utilize in an embedded system with limited resources. Therefore, we use GPU (General-Purpose Computing on Graphics Processing Units), which is used for general-purpose operation of GPU to solve the problem because we use pre-learned weights but there are still limitations. Since CNN performs simple and iterative operations, the computation speed varies greatly depending on the thread allocation and utilization method in the Single Instruction Multiple Thread (SIMT) based GPGPU. To solve this problem, there is a thread that needs to be relaxed when performing Convolution and Pooling operations with threads. The remaining threads have increased the operation speed by using the method used in the following feature maps and kernel calculations.

Development of ResNet based Crop Growth Stage Estimation Model (ResNet 기반 작물 생육단계 추정 모델 개발)

  • Park, Jun;Kim, June-Yeong;Park, Sung-Wook;Jung, Se-Hoon;Sim, Chun-Bo
    • Smart Media Journal
    • /
    • v.11 no.2
    • /
    • pp.53-62
    • /
    • 2022
  • Due to the accelerated global warming phenomenon after industrialization, the frequency of changes in the existing environment and abnormal climate is increasing. Agriculture is an industry that is very sensitive to climate change, and global warming causes problems such as reducing crop yields and changing growing regions. In addition, environmental changes make the growth period of crops irregular, making it difficult for even experienced farmers to easily estimate the growth stage of crops, thereby causing various problems. Therefore, in this paper, we propose a CNN model for estimating the growth stage of crops. The proposed model was a model that modified the pooling layer of ResNet, and confirmed the accuracy of higher performance than the growth stage estimation of the ResNet and DenseNet models.

Skin Lesion Segmentation with Codec Structure Based Upper and Lower Layer Feature Fusion Mechanism

  • Yang, Cheng;Lu, GuanMing
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.1
    • /
    • pp.60-79
    • /
    • 2022
  • The U-Net architecture-based segmentation models attained remarkable performance in numerous medical image segmentation missions like skin lesion segmentation. Nevertheless, the resolution gradually decreases and the loss of spatial information increases with deeper network. The fusion of adjacent layers is not enough to make up for the lost spatial information, thus resulting in errors of segmentation boundary so as to decline the accuracy of segmentation. To tackle the issue, we propose a new deep learning-based segmentation model. In the decoding stage, the feature channels of each decoding unit are concatenated with all the feature channels of the upper coding unit. Which is done in order to ensure the segmentation effect by integrating spatial and semantic information, and promotes the robustness and generalization of our model by combining the atrous spatial pyramid pooling (ASPP) module and channel attention module (CAM). Extensive experiments on ISIC2016 and ISIC2017 common datasets proved that our model implements well and outperforms compared segmentation models for skin lesion segmentation.

Classification Algorithms for Human and Dog Movement Based on Micro-Doppler Signals

  • Lee, Jeehyun;Kwon, Jihoon;Bae, Jin-Ho;Lee, Chong Hyun
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.6 no.1
    • /
    • pp.10-17
    • /
    • 2017
  • We propose classification algorithms for human and dog movement. The proposed algorithms use micro-Doppler signals obtained from humans and dogs moving in four different directions. A two-stage classifier based on a support vector machine (SVM) is proposed, which uses a radial-based function (RBF) kernel and $16^{th}$-order linear predictive code (LPC) coefficients as feature vectors. With the proposed algorithms, we obtain the best classification results when a first-level SVM classifies the type of movement, and then, a second-level SVM classifies the moving object. We obtain the correct classification probability 95.54% of the time, on average. Next, to deal with the difficult classification problem of human and dog running, we propose a two-layer convolutional neural network (CNN). The proposed CNN is composed of six ($6{\times}6$) convolution filters at the first and second layers, with ($5{\times}5$) max pooling for the first layer and ($2{\times}2$) max pooling for the second layer. The proposed CNN-based classifier adopts an auto regressive spectrogram as the feature image obtained from the $16^{th}$-order LPC vectors for a specific time duration. The proposed CNN exhibits 100% classification accuracy and outperforms the SVM-based classifier. These results show that the proposed classifiers can be used for human and dog classification systems and also for classification problems using data obtained from an ultra-wideband (UWB) sensor.

Deep Learning Architectures and Applications (딥러닝의 모형과 응용사례)

  • Ahn, SungMahn
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.2
    • /
    • pp.127-142
    • /
    • 2016
  • Deep learning model is a kind of neural networks that allows multiple hidden layers. There are various deep learning architectures such as convolutional neural networks, deep belief networks and recurrent neural networks. Those have been applied to fields like computer vision, automatic speech recognition, natural language processing, audio recognition and bioinformatics where they have been shown to produce state-of-the-art results on various tasks. Among those architectures, convolutional neural networks and recurrent neural networks are classified as the supervised learning model. And in recent years, those supervised learning models have gained more popularity than unsupervised learning models such as deep belief networks, because supervised learning models have shown fashionable applications in such fields mentioned above. Deep learning models can be trained with backpropagation algorithm. Backpropagation is an abbreviation for "backward propagation of errors" and a common method of training artificial neural networks used in conjunction with an optimization method such as gradient descent. The method calculates the gradient of an error function with respect to all the weights in the network. The gradient is fed to the optimization method which in turn uses it to update the weights, in an attempt to minimize the error function. Convolutional neural networks use a special architecture which is particularly well-adapted to classify images. Using this architecture makes convolutional networks fast to train. This, in turn, helps us train deep, muti-layer networks, which are very good at classifying images. These days, deep convolutional networks are used in most neural networks for image recognition. Convolutional neural networks use three basic ideas: local receptive fields, shared weights, and pooling. By local receptive fields, we mean that each neuron in the first(or any) hidden layer will be connected to a small region of the input(or previous layer's) neurons. Shared weights mean that we're going to use the same weights and bias for each of the local receptive field. This means that all the neurons in the hidden layer detect exactly the same feature, just at different locations in the input image. In addition to the convolutional layers just described, convolutional neural networks also contain pooling layers. Pooling layers are usually used immediately after convolutional layers. What the pooling layers do is to simplify the information in the output from the convolutional layer. Recent convolutional network architectures have 10 to 20 hidden layers and billions of connections between units. Training deep learning networks has taken weeks several years ago, but thanks to progress in GPU and algorithm enhancement, training time has reduced to several hours. Neural networks with time-varying behavior are known as recurrent neural networks or RNNs. A recurrent neural network is a class of artificial neural network where connections between units form a directed cycle. This creates an internal state of the network which allows it to exhibit dynamic temporal behavior. Unlike feedforward neural networks, RNNs can use their internal memory to process arbitrary sequences of inputs. Early RNN models turned out to be very difficult to train, harder even than deep feedforward networks. The reason is the unstable gradient problem such as vanishing gradient and exploding gradient. The gradient can get smaller and smaller as it is propagated back through layers. This makes learning in early layers extremely slow. The problem actually gets worse in RNNs, since gradients aren't just propagated backward through layers, they're propagated backward through time. If the network runs for a long time, that can make the gradient extremely unstable and hard to learn from. It has been possible to incorporate an idea known as long short-term memory units (LSTMs) into RNNs. LSTMs make it much easier to get good results when training RNNs, and many recent papers make use of LSTMs or related ideas.

Evaluation of Classification and Accuracy in Chest X-ray Images using Deep Learning with Convolution Neural Network (컨볼루션 뉴럴 네트워크 기반의 딥러닝을 이용한 흉부 X-ray 영상의 분류 및 정확도 평가)

  • Song, Ho-Jun;Lee, Eun-Byeol;Jo, Heung-Joon;Park, Se-Young;Kim, So-Young;Kim, Hyeon-Jeong;Hong, Joo-Wan
    • Journal of the Korean Society of Radiology
    • /
    • v.14 no.1
    • /
    • pp.39-44
    • /
    • 2020
  • The purpose of this study was learning about chest X-ray image classification and accuracy research through Deep Learning using big data technology with Convolution Neural Network. Normal 1,583 and Pneumonia 4,289 were used in chest X-ray images. The data were classified as train (88.8%), validation (0.2%) and test (11%). Constructed as Convolution Layer, Max pooling layer size 2×2, Flatten layer, and Image Data Generator. The number of filters, filter size, drop out, epoch, batch size, and loss function values were set when the Convolution layer were 3 and 4 respectively. The test data verification results showed that the predicted accuracy was 94.67% when the number of filters was 64-128-128-128, filter size 3×3, drop out 0.25, epoch 5, batch size 15, and loss function RMSprop was 4. In this study, the classification of chest X-ray Normal and Pneumonia was predictable with high accuracy, and it is believed to be of great help not only to chest X-ray images but also to other medical images.

3D CNN-Based Segmentation of Prostate MR images (3D CNN 기반 전립선 MRI 영상 분할 기술)

  • Mun, Juhyeok;Choi, Hwan;Lee, Se-Ho;Jang, Won-Dong;Kim, Chang-Su
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2017.06a
    • /
    • pp.145-146
    • /
    • 2017
  • 본 논문에서는 남성의 하반신을 촬영한 MRI 영상으로부터 전립선을 분할하는 알고리즘을 제안한다. 우선 3 차원 입체 영상을 학습하기 위해 3D 컨볼루션 계층(convolutional layer) 및 3D 풀링 계층(pooling layer)에 기반한 네트워크를 제안한다. 다음으로 네트워크의 최후단에 해당하는 전연결 계층(fully connected layer)의 강인한 학습을 돕는 잡음 계층을 제안한다. 잡음 계층은 네트워크의 학습 파라미터 혹은 출력 영상에 가우시안 잡음를 더함으로써 드롭 아웃과 같이 훈련 영상에 대한 과적합(overfitting)을 막고 테스트 영상에 강인한 네트워크의 학습을 돕는다. 마지막으로 실험을 통해 제안하는 기법이 기존 기법에 비해 우수한 분할 성능을 보임을 확인한다.

  • PDF

Iceberg-Ship Classification in SAR Images Using Convolutional Neural Network with Transfer Learning

  • Choi, Jeongwhan
    • Journal of Internet Computing and Services
    • /
    • v.19 no.4
    • /
    • pp.35-44
    • /
    • 2018
  • Monitoring through Synthesis Aperture Radar (SAR) is responsible for marine safety from floating icebergs. However, there are limits to distinguishing between icebergs and ships in SAR images. Convolutional Neural Network (CNN) is used to distinguish the iceberg from the ship. The goal of this paper is to increase the accuracy of identifying icebergs from SAR images. The metrics for performance evaluation uses the log loss. The two-layer CNN model proposed in research of C.Bentes et al.[1] is used as a benchmark model and compared with the four-layer CNN model using data augmentation. Finally, the performance of the final CNN model using the VGG-16 pre-trained model is compared with the previous model. This paper shows how to improve the benchmark model and propose the final CNN model.