• Title/Summary/Keyword: 몬테카를로 학습

Search Result 7, Processing Time 0.024 seconds

Max-Mean N-step Temporal-Difference Learning Using Multi-Step Return (멀티-스텝 누적 보상을 활용한 Max-Mean N-Step 시간차 학습)

  • Hwang, Gyu-Young;Kim, Ju-Bong;Heo, Joo-Seong;Han, Youn-Hee
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.10 no.5
    • /
    • pp.155-162
    • /
    • 2021
  • n-step TD learning is a combination of Monte Carlo method and one-step TD learning. If appropriate n is selected, n-step TD learning is known as an algorithm that performs better than Monte Carlo method and 1-step TD learning, but it is difficult to select the best values of n. In order to solve the difficulty of selecting the values of n in n-step TD learning, in this paper, using the characteristic that overestimation of Q can improve the performance of initial learning and that all n-step returns have similar values for Q ≈ Q*, we propose a new learning target, which is composed of the maximum and the mean of all k-step returns for 1 ≤ k ≤ n. Finally, in OpenAI Gym's Atari game environment, we compare the proposed algorithm with n-step TD learning and proved that the proposed algorithm is superior to n-step TD learning algorithm.

몬테카를로 최소자승법을 이용한 확률론적 기술가치평가 모형 연구

  • Seong, Tae-Eung;Lee, Jong-Taek;Kim, Byeong-Hun;Park, Hyeon-U
    • Proceedings of the Korea Technology Innovation Society Conference
    • /
    • 2017.11a
    • /
    • pp.715-721
    • /
    • 2017
  • 기술거래 시장의 활성화에 대한 연구개발서비스 분야 종사자들의 관심이 높아지고 있으며, 특히 공공 및 민간 분야의 휴면 기술(특허)에 대한 이전 거래를 통해 불필요한 특허유지 비용을 줄이고 부가적인 기술료 창출 효과를 거둘 수 있다. 본 연구에서는 현재까지 기술이전(거래), 현물출자, 기술금융(융자, 담보대출) 등 다양한 목적으로 실무에서 활용되어 온 기술가치평가 모형의 한계점을 고민해 보고, 이에 대한 개선방안으로서 몬테카를로 최소자승법 기반의 확률론적 가치평가 모형을 제시한다. 기존의 가치평가 모형은 평가산출을 위한 입력변수의 확정적 값들에 기반하여 가치액이 산출되었으나, 대표적 기법인 현금흐름 할인법이나 로열티공제법의 경우 미래의 수익예상기간, 예상매출액 등에서는 불확실성(uncertainty)가 내재되어 있다. 따라서 특정 분포(distribution)에 대한 확률론적 가능성을 가정하고 이에 대한 수학적 최적화 논리로부터 몬테카를로 최소자승 관게에 의한 변수결정 및 가치평가액 산정을 할 수 있는 평가모듈을 개발한다. 향후 연구에서는 기 평가된 사례결과를 딥러닝(deep learning) 방식으로 학습하여, 발생가능성 높은 각 변수값의 범위들을 산출하고 이로부터 기술가치 범위를 추론하는 시스템을 개발하는 것도 가능할 것으로 기대된다.

  • PDF

Automatic Generation of Music Accompaniment Using Reinforcement Learning (강화 학습을 통한 자동 반주 생성)

  • Kim, Na-Ri;Kwon, Ji-Yong;Yoo, Min-Joon;Lee, In-Kwon
    • 한국HCI학회:학술대회논문집
    • /
    • 2008.02a
    • /
    • pp.739-743
    • /
    • 2008
  • In this paper, we introduce a method for automatically generating accompaniment music, according to user's input melody. The initial accompaniment chord is generated by analyzing user's input melody. Then next chords are generated continuously based on markov chain probability table in which transition probabilities of each chord are defined. The probability table is learned according to reinforcement learning mechanism using sample data of existing music. Also during playing accompaniment, the probability table is learned and refined using reward values obtained in each status to improve the behavior of playing the chord in real-time. The similarity between user's input melody and each chord is calculated using pitch class histogram. Using our method, accompaniment chords harmonized with user's melody can be generated automatically in real-time.

  • PDF

Automatic Generation of Korean Poetry using Sequence Generative Adversarial Networks (SeqGAN 모델을 이용한 한국어 시 자동 생성)

  • Park, Yo-Han;Jeong, Hye-Ji;Kang, Il-Min;Park, Cheon-Young;Choi, Yong-Seok;Lee, Kong Joo
    • Annual Conference on Human and Language Technology
    • /
    • 2018.10a
    • /
    • pp.580-583
    • /
    • 2018
  • 본 논문에서는 SeqGAN 모델을 사용하여 한국어 시를 자동 생성해 보았다. SeqGAN 모델은 문장 생성을 위해 재귀 신경망과 강화 학습 알고리즘의 하나인 정책 그라디언트(Policy Gradient)와 몬테카를로 검색(Monte Carlo Search, MC) 기법을 생성기에 적용하였다. 시 문장을 자동 생성하기 위한 학습 데이터로는 사랑을 주제로 작성된 시를 사용하였다. SeqGAN 모델을 사용하여 자동 생성된 시는 동일한 구절이 여러번 반복되는 문제를 보였지만 한국어 텍스트 생성에 있어 SeqGAN 모델이 적용 가능함을 확인하였다.

  • PDF

DeepPurple : Chess Engine using Deep Learning (딥퍼플 : 딥러닝을 이용한 체스 엔진)

  • Yun, Sung-Hwan;Kim, Young-Ung
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.17 no.5
    • /
    • pp.119-124
    • /
    • 2017
  • In 1997, IBM's DeepBlue won the world chess championship, Garry Kasparov, and recently, Google's AlphaGo won all three games against Ke Jie, who was ranked 1st among all human Baduk players worldwide, interest in deep running has increased rapidly. DeepPurple, proposed in this paper, is a AI chess engine based on deep learning. DeepPurple Chess Engine consists largely of Monte Carlo Tree Search and policy network and value network, which are implemented by convolution neural networks. Through the policy network, the next move is predicted and the given situation is calculated through the value network. To select the most beneficial next move Monte Carlo Tree Search is used. The results show that the accuracy and the loss function cost of the policy network is 43% and 1.9. In the case of the value network, the accuracy is 50% and the loss function cost is 1, respectively.

Investigation of Detectable Crack Length in a Bolt Hole Using Eddy Current Inspection (와전류탐상검사를 이용하여 탐지 가능한 볼트홀 내부 균열 길이 연구)

  • Lee, Dooyoul;Yang, Seongun;Park, Jongun;Baek, Seil;Kim, Soonkil
    • Transactions of the Korean Society of Mechanical Engineers A
    • /
    • v.41 no.8
    • /
    • pp.729-736
    • /
    • 2017
  • In this study, the physics-based model and machine learning technique were used to conduct model-assisted probability of detection (MAPOD) experiments. The possibility of using in-service cracked parts was also investigated. Bolt hole shaped specimens with fatigue crack on the hole surface were inspected using eddy current inspection. Owing to MAPOD, the number of experimental factors decreased significantly. The uncertainty in the crack length measurement for in-service cracked parts was considered by the application of Monte Carlo simulation.

Korean speech recognition using deep learning (딥러닝 모형을 사용한 한국어 음성인식)

  • Lee, Suji;Han, Seokjin;Park, Sewon;Lee, Kyeongwon;Lee, Jaeyong
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.2
    • /
    • pp.213-227
    • /
    • 2019
  • In this paper, we propose an end-to-end deep learning model combining Bayesian neural network with Korean speech recognition. In the past, Korean speech recognition was a complicated task due to the excessive parameters of many intermediate steps and needs for Korean expertise knowledge. Fortunately, Korean speech recognition becomes manageable with the aid of recent breakthroughs in "End-to-end" model. The end-to-end model decodes mel-frequency cepstral coefficients directly as text without any intermediate processes. Especially, Connectionist Temporal Classification loss and Attention based model are a kind of the end-to-end. In addition, we combine Bayesian neural network to implement the end-to-end model and obtain Monte Carlo estimates. Finally, we carry out our experiments on the "WorimalSam" online dictionary dataset. We obtain 4.58% Word Error Rate showing improved results compared to Google and Naver API.