Search | Korea Science

Kernel-based actor-critic approach with applications

Chu, Baek-Suk;Jung, Keun-Woo;Park, Joo-Young
- International Journal of Fuzzy Logic and Intelligent Systems
- /
- v.11 no.4
- /
- pp.267-274
- /
- 2011
Recently, actor-critic methods have drawn significant interests in the area of reinforcement learning, and several algorithms have been studied along the line of the actor-critic strategy. In this paper, we consider a new type of actor-critic algorithms employing the kernel methods, which have recently shown to be very effective tools in the various fields of machine learning, and have performed investigations on combining the actor-critic strategy together with kernel methods. More specifically, this paper studies actor-critic algorithms utilizing the kernel-based least-squares estimation and policy gradient, and in its critic's part, the study uses a sliding-window-based kernel least-squares method, which leads to a fast and efficient value-function-estimation in a nonparametric setting. The applicability of the considered algorithms is illustrated via a robot locomotion problem and a tunnel ventilation control problem.
https://doi.org/10.5391/IJFIS.2011.11.4.267 인용 PDF KSCI

Geothermal properties for Database (지열자료 정보 D/B 구축 요소)

Kim, Hyoung-Chan;Park, Jeong-Min
- 한국신재생에너지학회:학술대회논문집
- /
- 2006.11a
- /
- pp.28-31
- /
- 2006
It is require to construct geothermal database to develop geothermal energy as renewable energy policy. It must be consist of geologic data, borehole data and geophysical data for geothermal database. In aspect of geology, there are included the distribution of geology, structural geology, geological time, rock name, density of rock, porosity, thermal diffusivity, specific capacity and thermal conductivity In order to calculate the heat general ion, it is needed to analysis the radioactivity elements as U, Th and K of rock. In aspect of borehole data, there are included temperature of depth, surface temperature and geothermal gradient And also there is geotherrnornetry using chemical components of groundwater as Na Ca, K and $SiO_2$. In aspect of geophysical data, there are some thematic map as booger gravity anomaly data and magnetic survey data and etc. In addition, it is important to descript the distribution of hot spring and water temperature.
PDF

Automatic Generation of Korean Poetry using Sequence Generative Adversarial Networks (SeqGAN 모델을 이용한 한국어 시 자동 생성)

Park, Yo-Han;Jeong, Hye-Ji;Kang, Il-Min;Park, Cheon-Young;Choi, Yong-Seok;Lee, Kong Joo
- Annual Conference on Human and Language Technology
- /
- 2018.10a
- /
- pp.580-583
- /
- 2018
본 논문에서는 SeqGAN 모델을 사용하여 한국어 시를 자동 생성해 보았다. SeqGAN 모델은 문장 생성을 위해 재귀 신경망과 강화 학습 알고리즘의 하나인 정책 그라디언트(Policy Gradient)와 몬테카를로 검색(Monte Carlo Search, MC) 기법을 생성기에 적용하였다. 시 문장을 자동 생성하기 위한 학습 데이터로는 사랑을 주제로 작성된 시를 사용하였다. SeqGAN 모델을 사용하여 자동 생성된 시는 동일한 구절이 여러번 반복되는 문제를 보였지만 한국어 텍스트 생성에 있어 SeqGAN 모델이 적용 가능함을 확인하였다.
PDF

Controller Learning Method of Self-driving Bicycle Using State-of-the-art Deep Reinforcement Learning Algorithms

Choi, Seung-Yoon;Le, Tuyen Pham;Chung, Tae-Choong
- Journal of the Korea Society of Computer and Information
- /
- v.23 no.10
- /
- pp.23-31
- /
- 2018
Recently, there have been many studies on machine learning. Among them, studies on reinforcement learning are actively worked. In this study, we propose a controller to control bicycle using DDPG (Deep Deterministic Policy Gradient) algorithm which is the latest deep reinforcement learning method. In this paper, we redefine the compensation function of bicycle dynamics and neural network to learn agents. When using the proposed method for data learning and control, it is possible to perform the function of not allowing the bicycle to fall over and reach the further given destination unlike the existing method. For the performance evaluation, we have experimented that the proposed algorithm works in various environments such as fixed speed, random, target point, and not determined. Finally, as a result, it is confirmed that the proposed algorithm shows better performance than the conventional neural network algorithms NAF and PPO.
https://doi.org/10.9708/jksci.2018.23.10.023 인용 PDF KSCI

A Study on Portfolio Asset Allocation Using Actor-Critic Model (Actor-Critic 모델을 이용한 포트폴리오 자산 배분에 관한 연구)

Kalina, Bayartsetseg;Lee, Ju-Hong;Song, Jae-Won
- Proceedings of the Korea Information Processing Society Conference
- /
- 2020.05a
- /
- pp.439-441
- /
- 2020
기존의 균등배분, 마코위츠, Recurrent Reinforcement Learning 방법들은 수익들을 최대화하거나 위험을 최소화하고, Risk Budgeting 방법은 각 자산에 목표 리스크를 배분하여 최적의 포트폴리오를 찾는다. 그러나 이 방법들은 미래의 최적화된 포트폴리오를 잘 찾아주지 못하는 문제점들이 있다. 본 논문은 자산 배분을 위한 Deterministic Policy Gradient 기반의 Actor Critic 모델을 개발하였고, 기존의 방법들보다 성능이 우수함을 검증한다.
https://doi.org/10.3745/PKIPS.y2020m05a.439 인용 PDF

Performance Comparison of Deep Reinforcement Learning based Computation Offloading in MEC (MEC 환경에서 심층 강화학습을 이용한 오프로딩 기법의 성능비교)

Moon, Sungwon;Lim, Yujin
- Proceedings of the Korea Information Processing Society Conference
- /
- 2022.05a
- /
- pp.52-55
- /
- 2022
5G 시대에 스마트 모바일 기기가 기하급수적으로 증가하면서 멀티 액세스 엣지 컴퓨팅(MEC)이 유망한 기술로 부상했다. 낮은 지연시간 안에 계산 집약적인 서비스를 제공하기 위해 MEC 서버로 오프로딩하는 특히, 태스크 도착률과 무선 채널의 상태가 확률적인 MEC 시스템 환경에서의 오프로딩 연구가 주목받고 있다. 본 논문에서는 차량의 전력과 지연시간을 최소화하기 위해 로컬 실행을 위한 연산 자원과 오프로딩을 위한 전송 전력을 할당하는 심층 강화학습 기반의 오프로딩 기법을 제안하였다. Deep Deterministic Policy Gradient (DDPG) 기반 기법과 Deep Q-network (DQN) 기반 기법을 차량의 전력 소비량과 큐잉 지연시간 측면에서 성능을 비교 분석하였다.
https://doi.org/10.3745/PKIPS.y2022m05a.52 인용 PDF

Computation Offloading with Resource Allocation Based on DDPG in MEC

Sungwon Moon;Yujin Lim
- Journal of Information Processing Systems
- /
- v.20 no.2
- /
- pp.226-238
- /
- 2024
Recently, multi-access edge computing (MEC) has emerged as a promising technology to alleviate the computing burden of vehicular terminals and efficiently facilitate vehicular applications. The vehicle can improve the quality of experience of applications by offloading their tasks to MEC servers. However, channel conditions are time-varying due to channel interference among vehicles, and path loss is time-varying due to the mobility of vehicles. The task arrival of vehicles is also stochastic. Therefore, it is difficult to determine an optimal offloading with resource allocation decision in the dynamic MEC system because offloading is affected by wireless data transmission. In this paper, we study computation offloading with resource allocation in the dynamic MEC system. The objective is to minimize power consumption and maximize throughput while meeting the delay constraints of tasks. Therefore, it allocates resources for local execution and transmission power for offloading. We define the problem as a Markov decision process, and propose an offloading method using deep reinforcement learning named deep deterministic policy gradient. Simulation shows that, compared with existing methods, the proposed method outperforms in terms of throughput and satisfaction of delay constraints.
https://doi.org/10.3745/JIPS.03.0194 인용 PDF

Research on Optimal Deployment of Sonobuoy for Autonomous Aerial Vehicles Using Virtual Environment and DDPG Algorithm (가상환경과 DDPG 알고리즘을 이용한 자율 비행체의 소노부이 최적 배치 연구)

Kim, Jong-In;Han, Min-Seok
- The Journal of Korea Institute of Information, Electronics, and Communication Technology
- /
- v.15 no.2
- /
- pp.152-163
- /
- 2022
In this paper, we present a method to enable an unmanned aerial vehicle to drop the sonobuoy, an essential element of anti-submarine warfare, in an optimal deployment. To this end, an environment simulating the distribution of sound detection performance was configured through the Unity game engine, and the environment directly configured using Unity ML-Agents and the reinforcement learning algorithm written in Python from the outside communicated with each other and learned. In particular, reinforcement learning is introduced to prevent the accumulation of wrong actions and affect learning, and to secure the maximum detection area for the sonobuoy while the vehicle flies to the target point in the shortest time. The optimal placement of the sonobuoy was achieved by applying the Deep Deterministic Policy Gradient (DDPG) algorithm. As a result of the learning, the agent flew through the sea area and passed only the points to achieve the optimal placement among the 70 target candidates. This means that an autonomous aerial vehicle that deploys a sonobuoy in the shortest time and maximum detection area, which is the requirement for optimal placement, has been implemented.
https://doi.org/10.17661/jkiiect.2022.15.2.152 인용 PDF KSCI HTML

Unlicensed Band Traffic and Fairness Maximization Approach Based on Rate-Splitting Multiple Access (전송률 분할 다중 접속 기술을 활용한 비면허 대역의 트래픽과 공정성 최대화 기법)

Jeon Zang Woo;Kim Sung Wook
- KIPS Transactions on Computer and Communication Systems
- /
- v.12 no.10
- /
- pp.299-308
- /
- 2023
As the spectrum shortage problem has accelerated by the emergence of various services, New Radio-Unlicensed (NR-U) has appeared, allowing users who communicated in licensed bands to communicate in unlicensed bands. However, NR-U network users reduce the performance of Wi-Fi network users who communicate in the same unlicensed band. In this paper, we aim to simultaneously maximize the fairness and throughput of the unlicensed band, where the NR-U network users and the WiFi network users coexist. First, we propose an optimal power allocation scheme based on Monte Carlo Policy Gradient of reinforcement learning to maximize the sum of rates of NR-U networks utilizing rate-splitting multiple access in unlicensed bands. Then, we propose a channel occupancy time division algorithm based on sequential Raiffa bargaining solution of game theory that can simultaneously maximize system throughput and fairness for the coexistence of NR-U and WiFi networks in the same unlicensed band. Simulation results show that the rate splitting multiple access shows better performance than the conventional multiple access technology by comparing the sum-rate when the result value is finally converged under the same transmission power. In addition, we compare the data transfer amount and fairness of NR-U network users, WiFi network users, and total system, and prove that the channel occupancy time division algorithm based on sequential Raiffa bargaining solution of this paper satisfies throughput and fairness at the same time than other algorithms.
https://doi.org/10.3745/KTCCS.2023.12.10.299 인용 PDF

A Diagnosis of Shrinking City Using Population Gradient Curve: A Case Study on the City of Yeong-ju (인구밀도경사함수를 이용한 도시축소현상 진단 - 영주시를 사례로 -)

Kim, Min-Seok;Byun, Tae-Geun;Lee, Sang-Ho
- Journal of the Korean Regional Science Association
- /
- v.35 no.4
- /
- pp.33-45
- /
- 2019
Due to the global low growth trend, urban shrinkage is a major issue of urban policy in major industrialized countries. According to the research results of the KRIHS(Korea Research Institute for Human Settlements, 2016), 23 out of 77 cities in Korea were diagnosed as continuous or temporary shrinking cities. However, the criterion for diagnosing shrinking cities remain on the simple demographic side, and the spatial shrinkage pattern of the city is not considered. Therefore, this study diagnosed urban shrinkage phenomenon considering the characteristics of Yeong-ju, a poly-centric city, by using the population gradient curve, which is one of the urban spatial structure analysis methods. As a result of the diagnosis, Yeong-ju turned out to be a shrinking city with the population density and the slope of population density increasing. In the case of area of Dong, a sprawl phenomenon in which the population density of the CBD and the slope of the population density were decreased was shown. And in the case of Punggi-eup, a simple shrinkage phenomenon in which only the population density of the CBD was decreased was shown. The results show that even within a city, the pattern of spatial change can be different for each centers. In the case of a city with a poly-centric structure, the implications for the individual diagnosis of not only the entire city but also the detailed area were drawn.
https://doi.org/10.22669/krsa.2019.35.4.033 인용 PDF

Search Result 73, Processing Time 0.022 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)