• Title/Summary/Keyword: rewards

Search Result 479, Processing Time 0.027 seconds

Solving Continuous Action/State Problem in Q-Learning Using Extended Rule Based Fuzzy Inference System

  • Kim, Min-Soeng;Lee, Ju-Jang
    • Transactions on Control, Automation and Systems Engineering
    • /
    • v.3 no.3
    • /
    • pp.170-175
    • /
    • 2001
  • Q-learning is a kind of reinforcement learning where the agent solves the given task based on rewards received from the environment. Most research done in the field of Q-learning has focused on discrete domains, although the environment with which the agent must interact is generally continuous. Thus we need to devise some methods that enable Q-learning to be applicable to the continuous problem domain. In this paper, an extended fuzzy rule is proposed so that it can incorporate Q-learning. The interpolation technique, which is widely used in memory-based learning, is adopted to represent the appropriate Q value for current state and action pair in each extended fuzzy rule. The resulting structure based on the fuzzy inference system has the capability of solving the continuous state about the environment. The effectiveness of the proposed structure is shown through simulation on the cart-pole system.

  • PDF

Socio-Organizational Variables Related to Teachers' Organizational Commitment (유아교사의 조직헌신성과 관련된 사회 조직적 변인에 관한 연구)

  • Hwangbo, Youngran
    • Korean Journal of Child Studies
    • /
    • v.22 no.1
    • /
    • pp.135-146
    • /
    • 2001
  • The present study investigated socio-organizational variables which influence teachers' commitment to the organization. Subjects were 254 teachers of private kindergartens in Kimhae, Kyungnam-do, and Tongrae-gu and Kangseo-gu in Pusan. The instrument was a questionnaire developed by Rosenholtz(1991), originally developed for the study of the teachers' work place (the social organization of elementary schools) and revised here for Korean early education. The data were analyzed by ANCOVA. Results revealed that the most important variable influencing teachers' organizational commitment is teachers' job certainty, next is teachers' learning opportunities; other variables are teachers' career, psychological rewards, and task autonomy and discretion.

  • PDF

A Study on the Use of Volunteerism in Local Governments;centering on the case of America (지방자치단체들의 자원봉사제도 활용방안에 관한 연구;미국의 사례를 중심으로)

  • Lee, Sung-Woo;Kim, Jeong-Seop
    • Journal of Agricultural Extension & Community Development
    • /
    • v.5 no.2
    • /
    • pp.165-175
    • /
    • 1998
  • Volunteerism, as one type of citizen participation, can help local governments to cut down administrative expenditure and contribute to rasing of democratic consciousness in community. In this study, researchers examined the practical use of volunteerism in America. And discussed diverse incentives to voluntary service and rewards for volunteers. Local governments in Korea have suffered under lack of human resource and public finance. By positive introduction of volunteerism, they may obtain the desired results.

  • PDF

A Study on the Combination of Deductible System with Bonus-Malus System

  • Kang, Jung-Chul;Young, Jeong-Jung
    • Journal of the Korean Data and Information Science Society
    • /
    • v.18 no.4
    • /
    • pp.1093-1101
    • /
    • 2007
  • Bonus-Malus system in automobile insurance rewards claim-free policyholders by premium discounts and penalizes policyholders with claims by premium surcharges. The purpose of adopting bonus-malus system is to alleviate differences in risk propensity. A well-known side-effect of bonus-malus system is the tendency of policyholders to pay small claims themselves and not report them to their, in order to avoid future premium increases. This phenomenon is called hunger for bonus. In this paper, we introduces an alternative approach to the Bonus-Malus system in automobile insurance - the approach is based on a deductible theory; and then search for a proper way combining both of them. Also, we construct a new algorithm to determine the optimal strategy of the policyholder based on the proposed model.

  • PDF

Solving Survival Gridworld Problem Using Hybrid Policy Modified Q-Based Reinforcement

  • Montero, Vince Jebryl;Jung, Woo-Young;Jeong, Yong-Jin
    • Journal of IKEEE
    • /
    • v.23 no.4
    • /
    • pp.1150-1156
    • /
    • 2019
  • This paper explores a model-free value-based approach for solving survival gridworld problem. Survival gridworld problem opens up a challenge involving taking risks to gain better rewards. Classic value-based approach in model-free reinforcement learning assumes minimal risk decisions. The proposed method involves a hybrid on-policy and off-policy updates to experience roll-outs using a modified Q-based update equation that introduces a parametric linear rectifier and motivational discount. The significance of this approach is it allows model-free training of agents that take into account risk factors and motivated exploration to gain better path decisions. Experimentations suggest that the proposed method achieved better exploration and path selection resulting to higher episode scores than classic off-policy and on-policy Q-based updates.

Reinforcement Learning Algorithm Using Domain Knowledge

  • Young, Jang-Si;Hong, Suh-Il;Hak, Kong-Sung;Rok, Oh-Sang
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2001.10a
    • /
    • pp.173.5-173
    • /
    • 2001
  • Q-Learning is a most widely used reinforcement learning, which addresses the question of how an autonomous agent can learn to choose optimal actions to achieve its goal about any one problem. Q-Learning can acquire optimal control strategies from delayed rewards, even when the agent has no prior knowledge of the effects of its action in the environment. If agent has an ability using previous knowledge, then it is expected that the agent can speed up learning by interacting with environment. We present a novel reinforcement learning method using domain knowledge, which is represented by problem-independent features and their classifiers. Here neural network are implied as knowledge classifiers. To show that an agent using domain knowledge can have better performance than the agent with standard Q-Learner. Computer simulations are ...

  • PDF

"You can't help but Like it": An Investigation of Mandatory Endorsement Solicitation and Gating Practices in Online Social Networks

  • Church, E. Mitchell;Passarello, Samantha
    • Asia pacific journal of information systems
    • /
    • v.26 no.1
    • /
    • pp.124-142
    • /
    • 2016
  • Companies operating in social network platforms continue to improve and expand their marketing techniques. This study examines the practice of "gating", which involves virtual barriers between social network users and company content. Gates demand mandatory user endorsements, in the form of a Facebook "Likes", Twitter "retweets" etc., to gain access to company content, such as coupons and rewards,. Gating practices demand a mandatory endorsement before any content consumption takes place. Thus, while user endorsements are assumed to arise voluntarily from trusted known sources, gating practices would appear to violate this assumption. However, whether this violation lessens the effectiveness of gating practices still requires empirical validation. We investigate this question through the use of a unique panel data set that includes data on "like" endorsements obtained from a number of real-world Facebook business pages. Results of the study show that gating practices are effective for endorsement solicitation; however, gates may interfere with more traditional marketing activities.

Decentralized learning automata for control of unknown markov chains

  • Hara, Motoshi;Abe, Kenichi
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 1990.10b
    • /
    • pp.1234-1239
    • /
    • 1990
  • In this paper, we propose a new type of decentralized learning automata for the control finite state Markov chains with unknown transition probabilities and rewards. In our scheme a .betha.-type learning automaton is associated with each state in which two or more actions(desisions) are available. In this decentralized learning automata system, each learning automaton operates, requiring only local information, to improve its performance under local environment. From simulation results, it is shown that the decentralized learning automata will converge to the optimal policy that produces the most highly total expected reward with discounting in all initiall states.

  • PDF

A Study of Adaptive QoS Routing scheme using Policy-gradient Reinforcement Learning (정책 기울기 값 강화학습을 이용한 적응적인 QoS 라우팅 기법 연구)

  • Han, Jeong-Soo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.2
    • /
    • pp.93-99
    • /
    • 2011
  • In this paper, we propose a policy-gradient routing scheme under Reinforcement Learning that can be used adaptive QoS routing. A policy-gradient RL routing can provide fast learning of network environments as using optimal policy adapted average estimate rewards gradient values. This technique shows that fast of learning network environments results in high success rate of routing. For prove it, we simulate and compare with three different schemes.

Ecotourism Study on the Chunsuman Bay Birdwatching Festival (생태문화관광 축제연구 : 천수만 철새 축제 사례)

  • Roh Yong-Ho;Jeong Gang-Hoan;Yhang Wii-Joo
    • Journal of Environmental Science International
    • /
    • v.14 no.3
    • /
    • pp.259-264
    • /
    • 2005
  • The purposes of this study were to investigate visitors' environmental attitudes of educational tourism, environmental preservation, and regional economic benefits satisfaction through the Chunsuman Bay Birdwatching Festival. The results of this study were as follows: First, the degree of educational satisfaction was high and this festival provided environmental preservation mind for visitors and local residents as well. The festival participants had a positive attitude toward migratory birds. Especially, the residents who had negative perceptions toward migratory birds due to physical harms of agricultural crops was positively changed after visiting this festival. So there should be prepared for more strategic approaches for residents' visiting and participating this birdwatching festival. Second, while the number of tourists was increased, it did not make a practical contribution to residents' income Particularly, farmers' satisfaction was low. So there should be more strategic programs to improve residents' economic rewards for sustainable development.