• Title/Summary/Keyword: Game Optimal

Search Result 273, Processing Time 0.023 seconds

Reinforcement Learning Approach to Agents Dynamic Positioning in Robot Soccer Simulation Games

  • Kwon, Ki-Duk;Kim, In-Cheol
    • Proceedings of the Korea Society for Simulation Conference
    • /
    • 2001.10a
    • /
    • pp.321-324
    • /
    • 2001
  • The robot soccer simulation game is a dynamic multi-agent environment. In this paper we suggest a new reinforcement learning approach to each agent's dynamic positioning in such dynamic environment. Reinforcement Beaming is the machine learning in which an agent learns from indirect, delayed reward an optimal policy to choose sequences of actions that produce the greatest cumulative reward. Therefore the reinforcement loaming is different from supervised teaming in the sense that there is no presentation of input-output pairs as training examples. Furthermore, model-free reinforcement loaming algorithms like Q-learning do not require defining or loaming any models of the surrounding environment. Nevertheless it can learn the optimal policy if the agent can visit every state-action pair infinitely. However, the biggest problem of monolithic reinforcement learning is that its straightforward applications do not successfully scale up to more complex environments due to the intractable large space of states. In order to address this problem, we suggest Adaptive Mediation-based Modular Q-Learning(AMMQL) as an improvement of the existing Modular Q-Learning(MQL). While simple modular Q-learning combines the results from each learning module in a fixed way, AMMQL combines them in a more flexible way by assigning different weight to each module according to its contribution to rewards. Therefore in addition to resolving the problem of large state space effectively, AMMQL can show higher adaptability to environmental changes than pure MQL. This paper introduces the concept of AMMQL and presents details of its application into dynamic positioning of robot soccer agents.

  • PDF

Ant Colony System for solving the traveling Salesman Problem Considering the Overlapping Edge of Global Best Path (순회 외판원 문제를 풀기 위한 전역 최적 경로의 중복 간선을 고려한 개미 집단 시스템)

  • Lee, Seung-Gwan;Kang, Myung-Ju
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.3
    • /
    • pp.203-210
    • /
    • 2011
  • Ant Colony System is a new meta heuristics algorithms to solve hard combinatorial optimization problems. It is a population based approach that uses exploitation of positive feedback as well as greedy search. It was first proposed for tackling the well known Traveling Salesman Problem. In this paper, we propose the searching method to consider the overlapping edge of the global best path of the previous and the current. This method is that we first determine the overlapping edge of the global best path of the previous and the current will be configured likely the optimal path. And, to enhance the pheromone for the overlapping edges increases the probability that the optimal path is configured. Finally, the performance of Best and Average-Best of proposed algorithm outperforms ACS-3-opt, ACS-Subpath and ACS-Iter algorithms.

A Study on the International R&D Competition and Optimal Tariff (국제 R&D 경쟁과 최적관세)

  • Li, Dong-Sheng;Lee, Jong-Min
    • Korea Trade Review
    • /
    • v.41 no.2
    • /
    • pp.29-60
    • /
    • 2016
  • Research and Development(R&D) investment is an issue of central importance in any economy. In this paper we analyze the relationship between R&D spillovers and trade-related variables, using a two-stage model where duopolists simultaneously decide on R&D in the first stage and engage in Cournot competition in the second stage. We characterized and compared the free-trade and trade-restriction R&D equilibrium in a two-stage game of R&D investment followed by Cournot market competition. We also assessed the impact of varying the R&D spillover on the equilibrium outcomes and tariff. We showed for both free trade and protection cases that there exists a unique symmetric solution(subgame perfect Nash equilibrium). As the solution, while analytical, cannot be stated in closed form, we resorted to numerical experiments to investigate the equilibrium results. Our estimates indicate for both free trade and protection cases that the level of R&D investment and the rate of R&D expenditure decrease as the degree of R&D spillovers increases, and that there is an inverse relation between the degree of R&D spillovers and level of protection. The latter implies that the larger the degree of R&D spillovers, the lesser the level of protection.

  • PDF

A Study on the Development of the Weapon System Effectiveness Indices (I) (한국적 무기체계 효과지수 개발에 관한 연구 (I) -무기체계 효과측정 방법론-)

  • Min, Kye-Ryo;Park, Kyung-Soo
    • Journal of the military operations research society of Korea
    • /
    • v.5 no.1
    • /
    • pp.123-135
    • /
    • 1979
  • Weapon System effectiveness indices are a key factor to the cost-effectiveness analysis in the process of allocating national resources in an optimal fashion. First part of this paper reviews the need of weapon -effectiveness indices, historical development of methodologies on military strength evaluation with various models of indices, and the interrelation-ship between weapon effectiveness indices and war-game. Second part of this paper analyzes the concepts and usage of the already developed methodologies, i .e., index of power, weapon lethality index (W.L.I.), index of fire power potential (I.F.P), fire power potential (F. P. P), and weapon effectiveness indices/weighted unit value (W. E. I/W. U. V.). Finally, weaknesses and limits of these methodologies are compared and evaluated. Result of this research shows that the W. L. I, I. F. P, and F. P. P seem to possess many non-scientific or ambiguous facets, but the W. E. I/W. U. V method produces more detailed, inclusive, and reasonable indices. The paper concludes with the emphasis on the importance of the provision of the theoretical bases to develop effectiveness indices which can be used to evaluate Korean weapon systems, and the early establishment of a specialized research group to manage and develop methodologies on the weapon effectiveness indices.

  • PDF

An End-to-End QoS Control Method for Heterogeneous Networks (이종 망을 위한 종단간 QoS 제어 방안)

  • Lee, Jong-Chan;Lee, Gi-Sung
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.10 no.10
    • /
    • pp.2715-2720
    • /
    • 2009
  • Supporting Quality of Service (QoS) for multimedia services in heterogeneous mobile networks is a part of key issue for Three Generation Evolution (3GE) development. A QoS management structure needs to guarantee the QoS of moving users based on an end-to-end negotiation to support the seamless service when MT is moving between the heterogeneous networks. We propose an end-to-end negotiation method based on SLA(Service Level Agreement). For this aim, the SLA control and algorithm for supporting MT's QoS is considered. Simulation is focused on the average delay and packet loss rate, and the results show that our proposed method provides mobile terminals with the optimal performance.

Calibration of a Korean Weapon Systems Wargame Model (한국적 무기체계의 워게임 모델 교정에 관한 연구)

  • Jung, Kun-Ho;Yum, Bong-Jin
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.12 no.2
    • /
    • pp.191-198
    • /
    • 2009
  • Some of the wargame simulators currently used in the Korean Army were developed by other countries, and do not adequately reflect the Korean Peninsula terrain and weapon systems. This implies that these war game simulators need to be calibrated with respect to the input parameters for properly assessing the effectiveness of the Korean weapon systems. In this paper, AWAM, a wargame simulator, is calibrated in terms of the time-based fighting power(FP). The FP data obtained from the Korea Combat Training Center(KCTC) are used as a reference, and the differences between the AWAM and KCTC FP data are calculated at certain points in time. Then, the Taguchi robust design method is adopted using the probabilities of hitting for the K-2 rifle as controllable input parameters. Two performance characteristics are used. One is the difference between the AWAM and KCTC FP data and the other is the score derived by grouping the difference data. For each case, optimal settings of the probabilities of hitting are determined such that the mean of each characteristic is close to 0 with its dispersion being as small as possible.

Dynamic Spectrum Load Balancing for Cognitive Radio in Frequency Domain and Time Domain

  • Chen, Ju-An;Sohn, Sung-Hwan;Gu, Jun-Rong;Kim, Jae-Moung
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.8 no.3
    • /
    • pp.71-82
    • /
    • 2009
  • As a solution to spectrum under-utilization problem, Cognitive radio (CR) introduces a dynamic spectrum access technology. In the area, one of the most important problems is how secondary users (SUs) should choose between the available channels, which means how to achieve load balancing between channels. We consider spectrum load balancing problem for CR system in frequency domain and especially in time domain. Our objective is to balance the load among the channels and balance the occupied time length of slots for a fixed channel dynamically in order to obtain a user-optimal solution. In frequency domain, we refer to Dynamic Noncooperative Scheme with Communication (DNCOOPC) used in distributed system and a distributed Dynamic Spectrum Load Balancing algorithm (DSLB) is formed based on DNCOOPC. In time domain, Spectrum Load Balancing method with QoS support is proposed based on Dynamic Feed Back theory and Hash Table (SLBDH). The performance of DSLB and SLBDH are evaluated. In frequency domain, DSLB is more efficient compared with existing Compare_And_Balance (CAB) algorithm and gets more throughput compared with Spectrum Load Balancing (SLB) algorithm. Also, DSLB is a fair scheme for all devices. In time domain, SLBDH is an efficient and precise solution compared with Spectrum Load Smoothing (SLS) method.

  • PDF

Design Concept and Architecture Analysis of Cell Microprocessor (Cell 마이크로프로세서 설계 개념과 아키텍쳐 분석)

  • Moon Sang-Gook
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2006.05a
    • /
    • pp.927-930
    • /
    • 2006
  • While Intel has been increasing its exclusive possession in the system IC semiconductor market, IBM, Sony, and Toshiba founded an alliance to develop the next entertainment multi-core processor, which is named CELL. Cell is designed upon the Power architecture and includes 8 SPE (Synergistic processor Element) cores for data handling, and supports SIMD architecture for optimal execution of multimedia, or game applications. Also, it includes expanded Power microarchitecture. In this paper, we analyzed and researched the Cell microprocessor, which is evaluated as the most powerful processor in this era.

  • PDF

A Cloud-Edge Collaborative Computing Task Scheduling and Resource Allocation Algorithm for Energy Internet Environment

  • Song, Xin;Wang, Yue;Xie, Zhigang;Xia, Lin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.6
    • /
    • pp.2282-2303
    • /
    • 2021
  • To solve the problems of heavy computing load and system transmission pressure in energy internet (EI), we establish a three-tier cloud-edge integrated EI network based on a cloud-edge collaborative computing to achieve the tradeoff between energy consumption and the system delay. A joint optimization problem for resource allocation and task offloading in the threetier cloud-edge integrated EI network is formulated to minimize the total system cost under the constraints of the task scheduling binary variables of each sensor node, the maximum uplink transmit power of each sensor node, the limited computation capability of the sensor node and the maximum computation resource of each edge server, which is a Mixed Integer Non-linear Programming (MINLP) problem. To solve the problem, we propose a joint task offloading and resource allocation algorithm (JTOARA), which is decomposed into three subproblems including the uplink transmission power allocation sub-problem, the computation resource allocation sub-problem, and the offloading scheme selection subproblem. Then, the power allocation of each sensor node is achieved by bisection search algorithm, which has a fast convergence. While the computation resource allocation is derived by line optimization method and convex optimization theory. Finally, to achieve the optimal task offloading, we propose a cloud-edge collaborative computation offloading schemes based on game theory and prove the existence of Nash Equilibrium. The simulation results demonstrate that our proposed algorithm can improve output performance as comparing with the conventional algorithms, and its performance is close to the that of the enumerative algorithm.

Comparison of Sentiment Classification Performance of for RNN and Transformer-Based Models on Korean Reviews (RNN과 트랜스포머 기반 모델들의 한국어 리뷰 감성분류 비교)

  • Jae-Hong Lee
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.18 no.4
    • /
    • pp.693-700
    • /
    • 2023
  • Sentiment analysis, a branch of natural language processing that classifies and identifies subjective opinions and emotions in text documents as positive or negative, can be used for various promotions and services through customer preference analysis. To this end, recent research has been conducted utilizing various techniques in machine learning and deep learning. In this study, we propose an optimal language model by comparing the accuracy of sentiment analysis for movie, product, and game reviews using existing RNN-based models and recent Transformer-based language models. In our experiments, LMKorBERT and GPT3 showed relatively good accuracy among the models pre-trained on the Korean corpus.