• Title/Summary/Keyword: Gradient descent

Search Result 339, Processing Time 0.027 seconds

Deep Learning Architectures and Applications (딥러닝의 모형과 응용사례)

  • Ahn, SungMahn
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.2
    • /
    • pp.127-142
    • /
    • 2016
  • Deep learning model is a kind of neural networks that allows multiple hidden layers. There are various deep learning architectures such as convolutional neural networks, deep belief networks and recurrent neural networks. Those have been applied to fields like computer vision, automatic speech recognition, natural language processing, audio recognition and bioinformatics where they have been shown to produce state-of-the-art results on various tasks. Among those architectures, convolutional neural networks and recurrent neural networks are classified as the supervised learning model. And in recent years, those supervised learning models have gained more popularity than unsupervised learning models such as deep belief networks, because supervised learning models have shown fashionable applications in such fields mentioned above. Deep learning models can be trained with backpropagation algorithm. Backpropagation is an abbreviation for "backward propagation of errors" and a common method of training artificial neural networks used in conjunction with an optimization method such as gradient descent. The method calculates the gradient of an error function with respect to all the weights in the network. The gradient is fed to the optimization method which in turn uses it to update the weights, in an attempt to minimize the error function. Convolutional neural networks use a special architecture which is particularly well-adapted to classify images. Using this architecture makes convolutional networks fast to train. This, in turn, helps us train deep, muti-layer networks, which are very good at classifying images. These days, deep convolutional networks are used in most neural networks for image recognition. Convolutional neural networks use three basic ideas: local receptive fields, shared weights, and pooling. By local receptive fields, we mean that each neuron in the first(or any) hidden layer will be connected to a small region of the input(or previous layer's) neurons. Shared weights mean that we're going to use the same weights and bias for each of the local receptive field. This means that all the neurons in the hidden layer detect exactly the same feature, just at different locations in the input image. In addition to the convolutional layers just described, convolutional neural networks also contain pooling layers. Pooling layers are usually used immediately after convolutional layers. What the pooling layers do is to simplify the information in the output from the convolutional layer. Recent convolutional network architectures have 10 to 20 hidden layers and billions of connections between units. Training deep learning networks has taken weeks several years ago, but thanks to progress in GPU and algorithm enhancement, training time has reduced to several hours. Neural networks with time-varying behavior are known as recurrent neural networks or RNNs. A recurrent neural network is a class of artificial neural network where connections between units form a directed cycle. This creates an internal state of the network which allows it to exhibit dynamic temporal behavior. Unlike feedforward neural networks, RNNs can use their internal memory to process arbitrary sequences of inputs. Early RNN models turned out to be very difficult to train, harder even than deep feedforward networks. The reason is the unstable gradient problem such as vanishing gradient and exploding gradient. The gradient can get smaller and smaller as it is propagated back through layers. This makes learning in early layers extremely slow. The problem actually gets worse in RNNs, since gradients aren't just propagated backward through layers, they're propagated backward through time. If the network runs for a long time, that can make the gradient extremely unstable and hard to learn from. It has been possible to incorporate an idea known as long short-term memory units (LSTMs) into RNNs. LSTMs make it much easier to get good results when training RNNs, and many recent papers make use of LSTMs or related ideas.

Depth Scaling Strategy Using a Flexible Damping Factor forFrequency-Domain Elastic Full Waveform Inversion

  • Oh, Ju-Won;Kim, Shin-Woong;Min, Dong-Joo;Moon, Seok-Joon;Hwang, Jong-Ha
    • Journal of the Korean earth science society
    • /
    • v.37 no.5
    • /
    • pp.277-285
    • /
    • 2016
  • We introduce a depth scaling strategy to improve the accuracy of frequency-domain elastic full waveform inversion (FWI) using the new pseudo-Hessian matrix for seismic data without low-frequency components. The depth scaling strategy is based on the fact that the damping factor in the Levenberg-Marquardt method controls the energy concentration in the gradient. In other words, a large damping factor makes the Levenberg-Marquardt method similar to the steepest-descent method, by which shallow structures are mainly recovered. With a small damping factor, the Levenberg-Marquardt method becomes similar to the Gauss-Newton methods by which we can resolve deep structures as well as shallow structures. In our depth scaling strategy, a large damping factor is used in the early stage and then decreases automatically with the trend of error as the iteration goes on. With the depth scaling strategy, we can gradually move the parameter-searching region from shallow to deep parts. This flexible damping factor plays a role in retarding the model parameter update for shallow parts and mainly inverting deeper parts in the later stage of inversion. By doing so, we can improve deep parts in inversion results. The depth scaling strategy is applied to synthetic data without lowfrequency components for a modified version of the SEG/EAGE overthrust model. Numerical examples show that the flexible damping factor yields better results than the constant damping factor when reliable low-frequency components are missing.

Optimizing Feature Extractioin for Multiclass problems Based on Classification Error (다중 클래스 데이터를 위한 분류오차 최소화기반 특징추출 기법)

  • Choi, Eui-Sun;Lee, Chul-Hee
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.37 no.2
    • /
    • pp.39-49
    • /
    • 2000
  • In this paper, we propose an optimizing feature extraction method for multiclass problems assuming normal distributions. Initially, We start with an arbitrary feature vector Assuming that the feature vector is used for classification, we compute the classification error Then we move the feature vector slightly in the direction so that classification error decreases most rapidly This can be done by taking gradient We propose two search methods, sequential search and global search In the sequential search, an additional feature vector is selected so that it provides the best accuracy along with the already chosen feature vectors In the global search, we are not constrained to use the chosen feature vectors Experimental results show that the proposed algorithm provides a favorable performance.

  • PDF

A Robust Backpropagation Algorithm and It's Application (문자인식을 위한 로버스트 역전파 알고리즘)

  • Oh, Kwang-Sik;Kim, Sang-Min;Lee, Dong-No
    • Journal of the Korean Data and Information Science Society
    • /
    • v.8 no.2
    • /
    • pp.163-171
    • /
    • 1997
  • Function approximation from a set of input-output pairs has numerous applications in scientific and engineering areas. Multilayer feedforward neural networks have been proposed as a good approximator of nonlinear function. The back propagation(BP) algorithm allows multilayer feedforward neural networks to learn input-output mappings from training samples. It iteratively adjusts the network parameters(weights) to minimize the sum of squared approximation errors using a gradient descent technique. However, the mapping acquired through the BP algorithm may be corrupt when errorneous training data we employed. When errorneous traning data are employed, the learned mapping can oscillate badly between data points. In this paper we propose a robust BP learning algorithm that is resistant to the errorneous data and is capable of rejecting gross errors during the approximation process, that is stable under small noise perturbation and robust against gross errors.

  • PDF

Analysis of Microwave Inverse Scattering Using the Broadband Electromagnetic Waves (광대역 전자파를 이용한 역산란 해석 연구)

  • Lee Jung-Hoon;Chung Young-Seek;So Joon-Ho;Kim Junyeon;Jang Won
    • The Journal of Korean Institute of Electromagnetic Engineering and Science
    • /
    • v.17 no.2 s.105
    • /
    • pp.158-164
    • /
    • 2006
  • In this paper, we proposed a new algorithm of the inverse scattering for the reconstruction of unknown dielectric scatterers using the finite-difference time-domain method and the design sensitivity analysis. We introduced the design sensitivity analysis based on the gradient information for the fast convergence of the reconstruction. By introducing the adjoint variable method for the efficient calculation, we derived the adjoint variable equation. As an optimal algorithm, we used the steepest descent method and reconstructed the dielectric targets using the iterative estimation. To verify our algorithm, we will show the numerical examples for the two-dimensional $TM^2$ cases.

A Novel Road Segmentation Technique from Orthophotos Using Deep Convolutional Autoencoders

  • Sameen, Maher Ibrahim;Pradhan, Biswajeet
    • Korean Journal of Remote Sensing
    • /
    • v.33 no.4
    • /
    • pp.423-436
    • /
    • 2017
  • This paper presents a deep learning-based road segmentation framework from very high-resolution orthophotos. The proposed method uses Deep Convolutional Autoencoders for end-to-end mapping of orthophotos to road segmentations. In addition, a set of post-processing steps were applied to make the model outputs GIS-ready data that could be useful for various applications. The optimization of the model's parameters is explained which was conducted via grid search method. The model was trained and implemented in Keras, a high-level deep learning framework run on top of Tensorflow. The results show that the proposed model with the best-obtained hyperparameters could segment road objects from orthophotos at an average accuracy of 88.5%. The results of optimization revealed that the best optimization algorithm and activation function for the studied task are Stochastic Gradient Descent (SGD) and Exponential Linear Unit (ELU), respectively. In addition, the best numbers of convolutional filters were found to be 8 for the first and second layers and 128 for the third and fourth layers of the proposed network architecture. Moreover, the analysis on the time complexity of the model showed that the model could be trained in 4 hours and 50 minutes on 1024 high-resolution images of size $106{\times}106pixels$, and segment road objects from similar size and resolution images in around 14 minutes. The results show that the deep learning models such as Convolutional Autoencoders could be a best alternative to traditional machine learning models for road segmentation from aerial photographs.

An On-line Construction of Generalized RBF Networks for System Modeling (시스템 모델링을 위한 일반화된 RBF 신경회로망의 온라인 구성)

  • Kwon, Oh-Shin;Kim, Hyong-Suk;Choi, Jong-Soo
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.37 no.1
    • /
    • pp.32-42
    • /
    • 2000
  • This paper presents an on-line learning algorithm for sequential construction of generalized radial basis function networks (GRBFNs) to model nonlinear systems from empirical data. The GRBFN, an extended from of standard radial basis function (RBF) networks with constant weights, is an architecture capable of representing nonlinear systems by smoothly integrating local linear models. The proposed learning algorithm has a two-stage learning scheme that performs both structure learning and parameter learning. The structure learning stage constructs the GRBFN model using two construction criteria, based on both training error criterion and Mahalanobis distance criterion, to assign new hidden units and the linear local models for given empirical training data. In the parameter learning stage the network parameters are updated using the gradient descent rule. To evaluate the modeling performance of the proposed algorithm, simulations and their results applied to two well-known benchmarks are discussed.

  • PDF

Adaptive Intra Frame Encoding for H.264/AVC (H.264/AVC를 위한 적응적 인트라 프레임 압축)

  • Park, Sang-Hyun
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.9 no.12
    • /
    • pp.1447-1454
    • /
    • 2014
  • In H.264 standard, an intra frame is the first frame of a GOP (Group of Pictures) and all macroblocks of an intra frame are encoded using the same quantization parameter. In addition, an intra frame is used for encoding the following frames of the same GOP so the encoding results of an intra frame affect the encoding results of the entire GOP. Thus, it is important to find the optimal quantization parameter of an intra frame for improving the quality of a GOP. In this paper, we propose an searching method for an optimal quantization parameter of an intra frame in real time. The proposed method uses a gradient descent method to find the optimal value based on characteristics of the optimal quantization parameters. Experimental results show that the proposed method captures the characteristics of the optimal quantization parameter and accurately estimates the optimal value.

Fast Block-Matching Motion Estimation Using Constrained Diamond Search Algorithm (구속조건을 적용한 다이아몬드 탐색 알고리즘에 의한 고속블록정합움직임추정)

  • 홍성용
    • Journal of the Korea Society of Computer and Information
    • /
    • v.8 no.4
    • /
    • pp.13-20
    • /
    • 2003
  • Based on the studies on the motion vector distributions estimated on the image sequences, we proposed constrained diamond search (DS) algorithm for fast block-matching motion estimation. By considering the fact that motion vectors are searched within the 2 pixels distance in vertically and horizontally on average, we confirmed that DS algorithm achieves close performance on error ratio and requires less computation compared with new three-step search (NTSS) algorithm. Also, by applying displaced frame difference (DFD) to DS algorithm, we reduced the computational loads needed to estimate the motion vectors within the stable block that do not have motions. And we reduced the possibilities falling into the local minima in the course of estimation of motion vectors by applying DFD to DS algorithm. So, we knew that proposed constrained DS algorithm achieved enhanced results as aspects of error ratio and the number of search points to be necessary compared with conventional DS algorithm, four step search (FSS) algorithm, and block-based gradient-descent search algorithm

  • PDF

Target Recognition Method of DTV-Based Passive Radar Using Multi-Channel Combining Method (다중 채널 융합 기법을 이용한 DTV 기반 수동형 레이다의 표적 인식 방법)

  • Seol, Seung-Hwan;Choi, Young-Jae;Choi, In-Sik
    • The Journal of Korean Institute of Electromagnetic Engineering and Science
    • /
    • v.28 no.10
    • /
    • pp.794-801
    • /
    • 2017
  • In this paper, we proposed airborne target recognition using multi-channel combining method in DTV-based passive radar. By combining multi-channel signals, we obtained the HRRP with sufficient range resolution. HRRP was obtained by AR method or zero-padding. From the obtained HRRP, we extracted scattering centers by CLEAN algorithm using the gradient descent. We extracted feature vectors and performed target recognition after training neural network using the extracted feature vectors. To verify performance of proposed methods, we assumed frequency bands of three broadcasting transmitters operated in Korea(Mt. Gwan-ak, Mt. Yong-moon, Kyeon-wol-ak) and used full scale 3D CAD model of four targets. Also we compared the target recognition performance of the proposed method with that of using only single-channel of three broadcasting transmitters. As a result, proposed methods showed better performance than using only single-channel at three broadcasting transmitters.