Search | Korea Science

Reinforcement Learning Using State Space Compression (상태 공간 압축을 이용한 강화학습)

Kim, Byeong-Cheon;Yun, Byeong-Ju
- The Transactions of the Korea Information Processing Society
- /
- v.6 no.3
- /
- pp.633-640
- /
- 1999
Reinforcement learning performs learning through interacting with trial-and-error in dynamic environment. Therefore, in dynamic environment, reinforcement learning method like Q-learning and TD(Temporal Difference)-learning are faster in learning than the conventional stochastic learning method. However, because many of the proposed reinforcement learning algorithms are given the reinforcement value only when the learning agent has reached its goal state, most of the reinforcement algorithms converge to the optimal solution too slowly. In this paper, we present COMREL(COMpressed REinforcement Learning) algorithm for finding the shortest path fast in a maze environment, select the candidate states that can guide the shortest path in compressed maze environment, and learn only the candidate states to find the shortest path. After comparing COMREL algorithm with the already existing Q-learning and Priortized Sweeping algorithm, we could see that the learning time shortened very much.
PDF

Pan evaporation modeling using deep learning theory (Deep learning 이론을 이용한 증발접시 증발량 모형화)

Seo, Youngmin;Kim, Sungwon
- Proceedings of the Korea Water Resources Association Conference
- /
- 2017.05a
- /
- pp.392-395
- /
- 2017
본 연구에서는 일 증발접시 증발량 산정을 위한 딥러닝 (deep learning) 모형의 적용성을 평가하였다. 본 연구에서 적용된 딥러닝 모형은 deep belief network (DBN) 기반 deep neural network (DNN) (DBN-DNN) 모형이다. 모형 적용성 평가를 위하여 부산 관측소에서 측정된 기상자료를 활용하였으며, 증발량과의 상관성이 높은 기상변수들 (일사량, 일조시간, 평균지상온도, 최대기온)의 조합을 고려하여 입력변수집합 (Set 1, Set 2, Set 3)별 모형을 구축하였다. DBN-DNN 모형의 성능은 통계학적 모형성능 평가지표 (coefficient of efficiency, CE; coefficient of determination, $r^2$; root mean square error, RMSE; mean absolute error, MAE)를 이용하여 평가되었으며, 기존의 두가지 형태의 ANN (artificial neural network), 즉 모형학습 시 SGD (stochastic gradient descent) 및 GD (gradient descent)를 각각 적용한 ANN-SGD 및 ANN-GD 모형과 비교하였다. 효과적인 모형학습을 위하여 각 모형의 초매개변수들은 GA (genetic algorithm)를 이용하여 최적화하였다. 그 결과, Set 1에 대하여 ANN-GD1 모형, Set 2에 대하여 DBN-DNN2 모형, Set 3에 대하여 DBN-DNN3 모형이 가장 우수한 모형 성능을 나타내는 것으로 분석되었다. 비록 비교 모형들 사이의 모형성능이 큰 차이를 보이지는 않았으나, 모든 입력집합에 대하여 DBN-DNN3, DBN-DNN2, ANN-SGD3 순으로 모형 효율성이 우수한 것으로 나타났다.
PDF

Robust Lane Detection Algorithm for Autonomous Trucks in Container Terminal

Ngo Quang Vinh;Sam-Sang You;Le Ngoc Bao Long;Hwan-Seong Kim
- Proceedings of the Korean Institute of Navigation and Port Research Conference
- /
- 2023.05a
- /
- pp.252-253
- /
- 2023
Container terminal automation might offer many potential benefits, such as increased productivity, reduced cost, and improved safety. Autonomous trucks can lead to more efficient container transport. A robust lane detection method is proposed using score-based generative modeling through stochastic differential equations for image-to-image translation. Image processing techniques are combined with Density-Based Spatial Clustering of Applications with Noise (DBSCAN) and Genetic Algorithm (GA) to ensure lane positioning robustness. The proposed method is validated by a dataset collected from the port terminals under different environmental conditions and tested the robustness of the lane detection method with stochastic noise.
PDF

Reinforcement learning Speedup method using Q-value Initialization (Q-value Initialization을 이용한 Reinforcement Learning Speedup Method)

최정환
- Proceedings of the IEEK Conference
- /
- 2001.06c
- /
- pp.13-16
- /
- 2001
In reinforcement teaming, Q-learning converges quite slowly to a good policy. Its because searching for the goal state takes very long time in a large stochastic domain. So I propose the speedup method using the Q-value initialization for model-free reinforcement learning. In the speedup method, it learns a naive model of a domain and makes boundaries around the goal state. By using these boundaries, it assigns the initial Q-values to the state-action pairs and does Q-learning with the initial Q-values. The initial Q-values guide the agent to the goal state in the early states of learning, so that Q-teaming updates Q-values efficiently. Therefore it saves exploration time to search for the goal state and has better performance than Q-learning. 1 present Speedup Q-learning algorithm to implement the speedup method. This algorithm is evaluated. in a grid-world domain and compared to Q-teaming.
PDF

On-line Vector Quantizer Design Using Stochastic Relaxation (Stochastic Relaxation 방법을 이용한 온라인 벡터 양자화기 설계)

Song, Geun-Bae;Lee, Haing-Sei
- Journal of the Institute of Electronics Engineers of Korea CI
- /
- v.38 no.5
- /
- pp.27-36
- /
- 2001
This paper proposes new design algorithms based on stochastic relaxation (SR) for an on-line vector quantizer (VQ) design. These proposed SR methods solve the local entrapment problems of the conventional Kohonen learning algorithm (KLA). These SR methods cover two different types depending upon the use of simulated annealing (SA) : the one that uses SA is called the OLVQ SA and the other the OLVQ SR. These methods arc combined with the KLA and therefore preserve the its convergence properties. Experimental results for Gauss Markov sources, real speech and image demonstrate that the proposed algorithms can consistently provide better codebooks than the KLA.
PDF

Modern Probabilistic Machine Learning and Control Methods for Portfolio Optimization

Park, Jooyoung;Lim, Jungdong;Lee, Wonbu;Ji, Seunghyun;Sung, Keehoon;Park, Kyungwook
- International Journal of Fuzzy Logic and Intelligent Systems
- /
- v.14 no.2
- /
- pp.73-83
- /
- 2014
Many recent theoretical developments in the field of machine learning and control have rapidly expanded its relevance to a wide variety of applications. In particular, a variety of portfolio optimization problems have recently been considered as a promising application domain for machine learning and control methods. In highly uncertain and stochastic environments, portfolio optimization can be formulated as optimal decision-making problems, and for these types of problems, approaches based on probabilistic machine learning and control methods are particularly pertinent. In this paper, we consider probabilistic machine learning and control based solutions to a couple of portfolio optimization problems. Simulation results show that these solutions work well when applied to real financial market data.
https://doi.org/10.5391/IJFIS.2014.14.2.73 인용 PDF KSCI

Learning of Differential Neural Networks Based on Kalman-Bucy Filter Theory (칼만-버쉬 필터 이론 기반 미분 신경회로망 학습)

Cho, Hyun-Cheol;Kim, Gwan-Hyung
- Journal of Institute of Control, Robotics and Systems
- /
- v.17 no.8
- /
- pp.777-782
- /
- 2011
Neural network technique is widely employed in the fields of signal processing, control systems, pattern recognition, etc. Learning of neural networks is an important procedure to accomplish dynamic system modeling. This paper presents a novel learning approach for differential neural network models based on the Kalman-Bucy filter theory. We construct an augmented state vector including original neural state and parameter vectors and derive a state estimation rule avoiding gradient function terms which involve to the conventional neural learning methods such as a back-propagation approach. We carry out numerical simulation to evaluate the proposed learning approach in nonlinear system modeling. By comparing to the well-known back-propagation approach and Kalman-Bucy filtering, its superiority is additionally proved under stochastic system environments.
https://doi.org/10.5302/J.ICROS.2011.17.8.777 인용 PDF KSCI

Comparison of Gradient Descent for Deep Learning (딥러닝을 위한 경사하강법 비교)

Kang, Min-Jae
- Journal of the Korea Academia-Industrial cooperation Society
- /
- v.21 no.2
- /
- pp.189-194
- /
- 2020
This paper analyzes the gradient descent method, which is the one most used for learning neural networks. Learning means updating a parameter so the loss function is at its minimum. The loss function quantifies the difference between actual and predicted values. The gradient descent method uses the slope of the loss function to update the parameter to minimize error, and is currently used in libraries that provide the best deep learning algorithms. However, these algorithms are provided in the form of a black box, making it difficult to identify the advantages and disadvantages of various gradient descent methods. This paper analyzes the characteristics of the stochastic gradient descent method, the momentum method, the AdaGrad method, and the Adadelta method, which are currently used gradient descent methods. The experimental data used a modified National Institute of Standards and Technology (MNIST) data set that is widely used to verify neural networks. The hidden layer consists of two layers: the first with 500 neurons, and the second with 300. The activation function of the output layer is the softmax function, and the rectified linear unit function is used for the remaining input and hidden layers. The loss function uses cross-entropy error.
https://doi.org/10.5762/KAIS.2020.21.2.189 인용 PDF KSCI

Evolutionary Learning-Rate Selection for BPNN with Window Control Scheme

Hoon, Jung-Sung
- Proceedings of the Korean Institute of Intelligent Systems Conference
- /
- 1997.10a
- /
- pp.301-308
- /
- 1997
The learning speed of the neural networks, the most important factor in applying to real problems, greatly depends on the learning rate of the networks, Three approaches-empirical, deterministic, and stochastic ones-have been proposed to date. We proposed a new learning-rate selection algorithm using an evolutionary programming search scheme. Even though the performance of our method showed better than those of the other methods, it was found that taking much time for selecting evolutionary learning rates made the performance of our method degrade. This was caused by using static intervals (called static windows) in order to update learning rates. Out algorithm with static windows updated the learning rates showed good performance or didn't update the learning rates even though previously updated learning rates shoved bad performance. This paper introduce a window control scheme to avoid such problems. With the window control scheme, our algorithm try to update the learning ra es only when the learning performance is continuously bad during a specified interval. If previously selected learning rates show good performance, new algorithm will not update the learning rates. This diminish the updating time of learning rates greatly. As a result, our algorithm with the window control scheme show better performance than that with static windows. In this paper, we will describe the previous and new algorithm and experimental results.
PDF

Second-order nonstationary source separation; Natural gradient learning (2차 Nonstationary 신호 분리: 자연기울기 학습)

최희열;최승진
- Proceedings of the Korean Information Science Society Conference
- /
- 2002.04b
- /
- pp.289-291
- /
- 2002
Host of source separation methods focus on stationary sources so higher-order statistics is necessary In this paler we consider a problem of source separation when sources are second-order nonstationary stochastic processes . We employ the natural gradient method and develop learning algorithms for both 1inear feedback and feedforward neural networks. Thus our algorithms possess equivariant property Local stabi1iffy analysis shows that separating solutions are always locally stable stationary points of the proposed algorithms, regardless of probability distributions of
PDF

Search Result 142, Processing Time 0.027 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)