• 제목/요약/키워드: Policy-based method

검색결과 2,228건 처리시간 0.023초

Solving Survival Gridworld Problem Using Hybrid Policy Modified Q-Based Reinforcement

  • Montero, Vince Jebryl;Jung, Woo-Young;Jeong, Yong-Jin
    • 전기전자학회논문지
    • /
    • 제23권4호
    • /
    • pp.1150-1156
    • /
    • 2019
  • This paper explores a model-free value-based approach for solving survival gridworld problem. Survival gridworld problem opens up a challenge involving taking risks to gain better rewards. Classic value-based approach in model-free reinforcement learning assumes minimal risk decisions. The proposed method involves a hybrid on-policy and off-policy updates to experience roll-outs using a modified Q-based update equation that introduces a parametric linear rectifier and motivational discount. The significance of this approach is it allows model-free training of agents that take into account risk factors and motivated exploration to gain better path decisions. Experimentations suggest that the proposed method achieved better exploration and path selection resulting to higher episode scores than classic off-policy and on-policy Q-based updates.

공통평가기준에 의한 보안정책모델 평가방법 (An Evaluation Method for Security Policy Model Based on Common Criteria)

  • 김상호;임춘성
    • 정보보호학회논문지
    • /
    • 제13권5호
    • /
    • pp.57-67
    • /
    • 2003
  • 보안정책모델은 평가대상제품(Target of Evaluation, TOE)의 보안정책을 비정형적, 준정형적, 또는 정형적 방법을 사용하여 구조적으로 표현하는 한 것이다. 보안정책모델은 보안기능요구사항과 기능명세간의 일관성 및 완전성을 제공함으로써 평가대상제품이 요구사항과 기능명세간 불명확성으로 인한 보안결점을 최소화할 수 있도록 보증성을 보장한다. 이러한 이유로 ISO/IEC 15408(공통평가기준. CC) 등 IT 제품 및 시스템의 보안성 평가기준의 고등급 평가에서 보안정책모델을 요구하고 있다. 본 논문에서는 보안정책모델의 개념과 관련 연구 및 공통평가기준의 보안정책모델 보증요구사항을 분석하여 보안정책모델 평가방법을 제시한다.

마르코프 결정 과정에서 시뮬레이션 기반 정책 개선의 효율성 향상을 위한 시뮬레이션 샘플 누적 방법 연구 (A Simulation Sample Accumulation Method for Efficient Simulation-based Policy Improvement in Markov Decision Process)

  • 황시랑;최선한
    • 한국멀티미디어학회논문지
    • /
    • 제23권7호
    • /
    • pp.830-839
    • /
    • 2020
  • As a popular mathematical framework for modeling decision making, Markov decision process (MDP) has been widely used to solve problem in many engineering fields. MDP consists of a set of discrete states, a finite set of actions, and rewards received after reaching a new state by taking action from the previous state. The objective of MDP is to find an optimal policy, that is, to find the best action to be taken in each state to maximize the expected discounted reward of policy (EDR). In practice, MDP is typically unknown, so simulation-based policy improvement (SBPI), which improves a given base policy sequentially by selecting the best action in each state depending on rewards observed via simulation, can be a practical way to find the optimal policy. However, the efficiency of SBPI is still a concern since many simulation samples are required to precisely estimate EDR for each action in each state. In this paper, we propose a method to select the best action accurately in each state using a small number of simulation samples, thereby improving the efficiency of SBPI. The proposed method accumulates the simulation samples observed in the previous states, so it is possible to precisely estimate EDR even with a small number of samples in the current state. The results of comparative experiments on the existing method demonstrate that the proposed method can improve the efficiency of SBPI.

물류창고에서 블록별 저장방식 및 주문 처리에 관한 연구 (The Block-Based Storage Policy and Order Processing in Logistics Warehouse)

  • 김명훈;김종화
    • 한국컴퓨터정보학회논문지
    • /
    • 제8권4호
    • /
    • pp.159-164
    • /
    • 2003
  • 물류창고의 적재위치는 창고를 이동하는 물품의 총 자재취급비용에 의해 직접전인 영향을 받는다. 본 논문의 목적은 물류 창고에서의 총 오더피킹 시간을 감소하기 위한 물품 저장방식인 블록 저장방식을 개발하고자 한다. 블록저장방식은 랙들을 블록으로 분할하고, 각 물품은 회전율과 도크와 블록간의 평균 거리를 기준으로 할당한다. 제시한 저장방식의 수행도를 평가하기 위해 다양한 주문 결합 방법들을 이용하여 기존의 등급별 저장방식과 비교 연구한다.

  • PDF

사용자 데모를 이용한 관계적 개체 기반 정책 학습 (Learning Relational Instance-Based Policies from User Demonstrations)

  • 박찬영;김현식;김인철
    • 한국정보과학회논문지:소프트웨어및응용
    • /
    • 제37권5호
    • /
    • pp.363-369
    • /
    • 2010
  • 데모-기반 학습은 사용자가 직접 작업을 시연함으로써 로봇에게 쉽게 새로운 작업지식을 가르칠 수 있다는 장점이 있다. 하지만 기존의 많은 데모-기반 학습법들은 상태공간과 정책들을 표현하기 위해 속성-값 벡터 모델을 이용하였다. 속성-값 벡터 모델의 제한성으로 인해, 이들은 학습과정의 효율성도 낮고 학습된 정책의 재사용성도 낮았다. 본 논문에서는 기존의 속성-값 모델 대신 관계적 모델을 이용하는 새로운 데모-기반 작업 학습법을 제안한다. 이 방법에서는 사용자 데모 기록에서 추출한 훈련 예들에 관계적 개체-기반 학습법을 적용함으로써, 동일 작업영역내의 다른 유사한 작업들에도 활용하기 용이한 관계적 개체-기반 정책을 유도한다. 이 관계적 정책은 (상태, 목표) 쌍으로 표현되는 임의의 한 상황에 대해 이것에 대응하는 하나의 실행동작을 결정해주는 역할을 한다. 본 논문에서는 데모-기반 관계적 정책 학습법에 대해 자세히 소개한 후, 로봇 시뮬레이터를 이용한 실험을 통해 이 학습법의 효과를 분석해본다.

데이터 기반 정책지원 대상 우수 중소기업 발굴 방법론 연구 : 국내 수산산업을 대상으로 (Data-based Method of Selecting Excellent SMEs for Governmental Funding Policy: Focused on Fishery Industry in Korea)

  • 황순욱;천동필
    • 수산경영론집
    • /
    • 제49권4호
    • /
    • pp.1-17
    • /
    • 2018
  • The Korean fisheries industry is a traditional business, the majority of which are small and medium-sized enterprises (SMEs). It has played an important role in the South Korean economies in the past several decades, but it currently faces the limitations of growth potential and profitability due to declining workforce, aging populations, deteriorating fishery environments, climate changes, and rapid changes in the global industrial ecosystem. Many studies have suggested solutions for the fisheries industry in macro perspective, but there are rarely any studies taking the strategic approaches for the problem. If it is possible for governments to support the companies that are likely to increase their value-added selectively, it will break through the current situation more effectively. This paper introduces a study on the selection method utilizing data envelopment analysis (DEA) to find SMEs with potentials to increase profits and growth. We suggest selecting SMEs with high management efficiency and ability to utilize intangible assets as the target companies. We also suggest policy objectives for SMEs in the domestic fisheries industry based on the results of DEA analysis and propose a data-based method for the policy decisions.

시스템 사고에 기반한 "지역교육청 기능 및 조직개편" 정책의 문제 및 원인 분석 (Analysis of Problems and Causal Relations of Functional Changes of Local Educational Authority Policy(FCLEAP) based on the Systems Thinking)

  • 하정윤;나민주
    • 한국시스템다이내믹스연구
    • /
    • 제15권2호
    • /
    • pp.75-96
    • /
    • 2014
  • The purpose of this paper is to analyze the functional changes of local educational authority policy based on the systems thinking perspective using causal loop diagrams. In the past, the main function of the local educational authority was to manage and supervise schools. Through this policy, local educational authority would be transformed into a support agency. However, this policy did not achieve the goal, was to cause confusion and require improvement. This study shows structured causes of the problem based on systems thinking. These diagrams make it possible for educational policy makers to provide ideas, although they have some complicated environment. The findings indicate that based on systems thinking in this policy can help those who related to policy decision than existing diagnosis method.

  • PDF

The big data analysis framework of information security policy based on security incidents

  • Jeong, Seong Hoon;Kim, Huy Kang;Woo, Jiyoung
    • 한국컴퓨터정보학회논문지
    • /
    • 제22권10호
    • /
    • pp.73-81
    • /
    • 2017
  • In this paper, we propose an analysis framework to capture the trends of information security incidents and evaluate the security policy based on the incident analysis. We build a big data from news media collecting security incidents news and policy news, identify key trends in information security from this, and present an analytical method for evaluating policies from the point of view of incidents. In more specific, we propose a network-based analysis model that allows us to easily identify the trends of information security incidents and policy at a glance, and a cosine similarity measure to find important events from incidents and policy announcements.

과학기술 정책의 과학화 서비스 개발에 관한 연구 (A Study on Developing Science Service of Science and Technology Policy)

  • 신문봉;전승수;황보택근
    • 한국IT서비스학회지
    • /
    • 제11권1호
    • /
    • pp.83-92
    • /
    • 2012
  • The development of science and technology oriented knowledge society accelerates the convergence between scientific theory and industrial technology and increases the complexity problem of social and economic sectors. These cause the difficulty of securing the reliability and objectivity of science and technology policy. These also are barriers of balanced evaluation between rational science and technology policy making, management, and policy coordination. In this regard, Advanced countries in science and technology develops policy support system and promotes the program of evidence-based SciSIP(Science of Science and Innovation policy) together. This paper introduces a new approach developing science service of science and technology policy utilizing business intelligence technology in Korea. Also, it proposes the integration method of policy knowledge base and component-based service supporting S&T policy decision-making process and introduces services case studies.

병원도산의 예측모형 개발연구 (Developing a Combined Forecasting Model on Hospital Closure)

  • 정기택;이훈영
    • 보건행정학회지
    • /
    • 제10권2호
    • /
    • pp.1-21
    • /
    • 2000
  • This study reviewde various parametic and nonparametic method for forexasting hospital closures in Korea. We compared multivariate discriminant analysis, multivartiate logistic regression, classfication and regression tree, and neural network method based on hit ratio of each model for forecasting hospital closure. Like other studies in the literture, neural metwork analysis showed highest average hit ratio. For policy and business purposes, we combined the four analytical method and constructed a foreasting model that can be easily used to predict the probabolity of hospital closure given financial information of a hospital.

  • PDF