• Title/Summary/Keyword: Q러닝

Search Result 60, Processing Time 0.03 seconds

A Study on Machine Learning and Basic Algorithms (기계학습 및 기본 알고리즘 연구)

  • Kim, Dong-Hyun;Lee, Tae-ho;Lee, Byung-Jun;Kim, Kyung-Tae;Youn, Hee-Yong
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2018.07a
    • /
    • pp.35-36
    • /
    • 2018
  • 본 논문에서는 기계학습 및 기계학습 기법 중에서도 Markov Decision Process (MDP)를 기반으로 하는 강화학습에 대해 알아보고자 한다. 강화학습은 기계학습의 일종으로 주어진 환경 안에서 의사결정자(Agent)는 현재의 상태를 인식하고 가능한 행동 집합 중에서 보상을 극대화할 수 있는 행동을 선택하는 방법이다. 일반적인 기계학습과는 달리 강화학습은 학습에 필요한 사전 지식을 요구하지 않기 때문에 불명확한 환경 속에서도 반복 학습이 가능하다. 본 연구에서는 일반적인 강화학습 및 강화학습 중에서 가장 많이 사용되고 있는 Q-learning 에 대해 간략히 설명한다.

  • PDF

Proactive Operational Method for the Transfer Robot of FMC (FMC 반송용 로봇의 선견형 운영방법)

  • Yoon, Jung-Ik;Um, In-Sup;Lee, Hong-Chul
    • Journal of the Korea Society for Simulation
    • /
    • v.17 no.4
    • /
    • pp.249-257
    • /
    • 2008
  • This paper shows the Applied Q-learning Algorithm which supports selecting the waiting position of a robot and the part serviced next in the Flexible Manufacturing Cell (FMC) that consists of one robot and various types of facilities. To verify the performance of the suggested algorithm, we design the general FMC made up of single transfer robot and multiple machines with a simulation method, and then compare the output with other control methods. As a result of the analysis, the algorithm we use improve the average processing time and total throughputs as well by increasing robot utilization, reversely, by decreasing robot waiting time. Furthermore, because of ease of use compared with other complex ways and its adoptability to real world, we expect that this method contribute to advance total FMC efficiency as well.

  • PDF

Improvement of the Gonu game using progressive deepening in reinforcement learning (강화학습에서 점진적인 심화를 이용한 고누게임의 개선)

  • Shin, YongWoo
    • Journal of Korea Game Society
    • /
    • v.20 no.6
    • /
    • pp.23-30
    • /
    • 2020
  • There are many cases in the game. So, Game have to learn a lot. This paper uses reinforcement learning to improve the learning speed. However, because reinforcement learning has many cases, it slows down early in learning. So, the speed of learning was improved by using the minimax algorithm. In order to compare the improved performance, a Gonu game was produced and tested. As for the experimental results, the win rate was high, but the result of a tie occurred. The game tree was further explored using progressive deepening to reduce tie cases and win rate has improved by about 75%.

Study on Q-value prediction ahead of tunnel excavation face using recurrent neural network (순환인공신경망을 활용한 터널굴착면 전방 Q값 예측에 관한 연구)

  • Hong, Chang-Ho;Kim, Jin;Ryu, Hee-Hwan;Cho, Gye-Chun
    • Journal of Korean Tunnelling and Underground Space Association
    • /
    • v.22 no.3
    • /
    • pp.239-248
    • /
    • 2020
  • Exact rock classification helps suitable support patterns to be installed. Face mapping is usually conducted to classify the rock mass using RMR (Rock Mass Ration) or Q values. There have been several attempts to predict the grade of rock mass using mechanical data of jumbo drills or probe drills and photographs of excavation surfaces by using deep learning. However, they took long time, or had a limitation that it is impossible to grasp the rock grade in ahead of the tunnel surface. In this study, a method to predict the Q value ahead of excavation surface is developed using recurrent neural network (RNN) technique and it is compared with the Q values from face mapping for verification. Among Q values from over 4,600 tunnel faces, 70% of data was used for learning, and the rests were used for verification. Repeated learnings were performed in different number of learning and number of previous excavation surfaces utilized for learning. The coincidence between the predicted and actual Q values was compared with the root mean square error (RMSE). RMSE value from 600 times repeated learning with 2 prior excavation faces gives a lowest values. The results from this study can vary with the input data sets, the results can help to understand how the past ground conditions affect the future ground conditions and to predict the Q value ahead of the tunnel excavation face.

A Case Study of Spatial CAD Education in Blended Learning Environment (혼합형 학습(Blended Learning) 환경에서의 공간디자인 CAD 수업 사례연구)

  • Hwang, Ji Hyoun;Lim, Haewon
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.10
    • /
    • pp.115-126
    • /
    • 2021
  • The purpose of this study is to closely analyze the case of blended-learning in order to provide a diverse and flexible learning environment while maintaining the nature of face-to-face classes, and to identify the learning environment that supports blended-learning in each class step and the educational experience of students. The experience and satisfaction of blended learning were investigated in various ways: course evaluation, LMS activity evaluation, and questionnaire before and after the class. As a result, the blended-learning is better than the traditional face-to-face classes, in providing real-time feedback, opportunities for various interactions, and textual conversations, anytime and anywhere. In addition, as a result of the preliminary survey, as a measure to solve the opinion that concentration was reduced due to problems such as networks and felt uncomfortable in the communication part, the theory and lectures of the design practice class were conducted non-face-to-face. The individual Q&A and feedback were conducted face-to-face and non-face-to-face. As a result of the follow-up survey, it was found that concentration and efficiency could be improved. This opens up possibilities for active use of the online environment in design practice classes.

Variable Selection of Feature Pattern using SVM-based Criterion with Q-Learning in Reinforcement Learning (SVM-기반 제약 조건과 강화학습의 Q-learning을 이용한 변별력이 확실한 특징 패턴 선택)

  • Kim, Chayoung
    • Journal of Internet Computing and Services
    • /
    • v.20 no.4
    • /
    • pp.21-27
    • /
    • 2019
  • Selection of feature pattern gathered from the observation of the RNA sequencing data (RNA-seq) are not all equally informative for identification of differential expressions: some of them may be noisy, correlated or irrelevant because of redundancy in Big-Data sets. Variable selection of feature pattern aims at differential expressed gene set that is significantly relevant for a special task. This issues are complex and important in many domains, for example. In terms of a computational research field of machine learning, selection of feature pattern has been studied such as Random Forest, K-Nearest and Support Vector Machine (SVM). One of most the well-known machine learning algorithms is SVM, which is classical as well as original. The one of a member of SVM-criterion is Support Vector Machine-Recursive Feature Elimination (SVM-RFE), which have been utilized in our research work. We propose a novel algorithm of the SVM-RFE with Q-learning in reinforcement learning for better variable selection of feature pattern. By comparing our proposed algorithm with the well-known SVM-RFE combining Welch' T in published data, our result can show that the criterion from weight vector of SVM-RFE enhanced by Q-learning has been improved by an off-policy by a more exploratory scheme of Q-learning.

Apparel Pattern CAD Education Based on Blended Learning for I-Generation (I-세대의 어패럴캐드 교육을 위한 블렌디드 러닝 활용 제안)

  • Choi, Young Lim
    • Fashion & Textile Research Journal
    • /
    • v.18 no.6
    • /
    • pp.766-775
    • /
    • 2016
  • In the era of globalization and unlimited competition, Korean universities need a breakthrough in their education system according to the changing education landscape, such as lower graduation requirements to cultivate more multi-talented convergence leaders. While each student has different learning capabilities, which results in different performance and achievements in the same class, the uniform education that most universities are currently offering fails to accommodate such differences. Blended learning, synergically combining offline and online classes, enlarges learning space and enriches learning experiences through diversified tools and materials, including multimedia. Recently, universities are increasingly adopting video contents and on-offline convergence learning strategy. Thus, this study suggests a teaching method based on blended learning to more effectively teach existing pattern CAD and virtual CAD in the Apparel Pattern CAD class. To this end, this researcher developed a teaching-learning method and curriculum according to the blended learning phase and video-based contents. The curriculum consisted of 2D CAD (SuperAlpha: Plus) and 3D CAD (CLO) software learning for 15 weeks. Then, it was loaded to the Learning Management System (LMS) and operated for 15 weeks both online and offline. The performance analysis of LMS usage found that class materials, among online postings, were viewed the most. The discussion menu most accurately depicted students' participation, and students who did not participate in discussions were estimated to check postings less than participating students. A survey on the blended learning found that students prefer digital or more digitized classes, while preferring face to face for Q&As.

Random Balance between Monte Carlo and Temporal Difference in off-policy Reinforcement Learning for Less Sample-Complexity (오프 폴리시 강화학습에서 몬테 칼로와 시간차 학습의 균형을 사용한 적은 샘플 복잡도)

  • Kim, Chayoung;Park, Seohee;Lee, Woosik
    • Journal of Internet Computing and Services
    • /
    • v.21 no.5
    • /
    • pp.1-7
    • /
    • 2020
  • Deep neural networks(DNN), which are used as approximation functions in reinforcement learning (RN), theoretically can be attributed to realistic results. In empirical benchmark works, time difference learning (TD) shows better results than Monte-Carlo learning (MC). However, among some previous works show that MC is better than TD when the reward is very rare or delayed. Also, another recent research shows when the information observed by the agent from the environment is partial on complex control works, it indicates that the MC prediction is superior to the TD-based methods. Most of these environments can be regarded as 5-step Q-learning or 20-step Q-learning, where the experiment continues without long roll-outs for alleviating reduce performance degradation. In other words, for networks with a noise, a representative network that is regardless of the controlled roll-outs, it is better to learn MC, which is robust to noisy rewards than TD, or almost identical to MC. These studies provide a break with that TD is better than MC. These recent research results show that the way combining MC and TD is better than the theoretical one. Therefore, in this study, based on the results shown in previous studies, we attempt to exploit a random balance with a mixture of TD and MC in RL without any complicated formulas by rewards used in those studies do. Compared to the DQN using the MC and TD random mixture and the well-known DQN using only the TD-based learning, we demonstrate that a well-performed TD learning are also granted special favor of the mixture of TD and MC through an experiments in OpenAI Gym.

The Effect of Input Variables Clustering on the Characteristics of Ensemble Machine Learning Model for Water Quality Prediction (입력자료 군집화에 따른 앙상블 머신러닝 모형의 수질예측 특성 연구)

  • Park, Jungsu
    • Journal of Korean Society on Water Environment
    • /
    • v.37 no.5
    • /
    • pp.335-343
    • /
    • 2021
  • Water quality prediction is essential for the proper management of water supply systems. Increased suspended sediment concentration (SSC) has various effects on water supply systems such as increased treatment cost and consequently, there have been various efforts to develop a model for predicting SSC. However, SSC is affected by both the natural and anthropogenic environment, making it challenging to predict SSC. Recently, advanced machine learning models have increasingly been used for water quality prediction. This study developed an ensemble machine learning model to predict SSC using the XGBoost (XGB) algorithm. The observed discharge (Q) and SSC in two fields monitoring stations were used to develop the model. The input variables were clustered in two groups with low and high ranges of Q using the k-means clustering algorithm. Then each group of data was separately used to optimize XGB (Model 1). The model performance was compared with that of the XGB model using the entire data (Model 2). The models were evaluated by mean squared error-ob servation standard deviation ratio (RSR) and root mean squared error. The RSR were 0.51 and 0.57 in the two monitoring stations for Model 2, respectively, while the model performance improved to RSR 0.46 and 0.55, respectively, for Model 1.

Smart Target Detection System Using Artificial Intelligence (인공지능을 이용한 스마트 표적탐지 시스템)

  • Lee, Sung-nam
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.05a
    • /
    • pp.538-540
    • /
    • 2021
  • In this paper, we proposed a smart target detection system that detects and recognizes a designated target to provide relative motion information when performing a target detection mission of a drone. The proposed system focused on developing an algorithm that can secure adequate accuracy (i.e. mAP, IoU) and high real-time at the same time. The proposed system showed an accuracy of close to 1.0 after 100k learning of the Google Inception V2 deep learning model, and the inference speed was about 60-80[Hz] when using a high-performance laptop based on the real-time performance Nvidia GTX 2070 Max-Q. The proposed smart target detection system will be operated like a drone and will be helpful in successfully performing surveillance and reconnaissance missions by automatically recognizing the target using computer image processing and following the target.

  • PDF