• Title/Summary/Keyword: Q-value model

Search Result 215, Processing Time 0.029 seconds

Reinforcement learning Speedup method using Q-value Initialization (Q-value Initialization을 이용한 Reinforcement Learning Speedup Method)

  • 최정환
    • Proceedings of the IEEK Conference
    • /
    • 2001.06c
    • /
    • pp.13-16
    • /
    • 2001
  • In reinforcement teaming, Q-learning converges quite slowly to a good policy. Its because searching for the goal state takes very long time in a large stochastic domain. So I propose the speedup method using the Q-value initialization for model-free reinforcement learning. In the speedup method, it learns a naive model of a domain and makes boundaries around the goal state. By using these boundaries, it assigns the initial Q-values to the state-action pairs and does Q-learning with the initial Q-values. The initial Q-values guide the agent to the goal state in the early states of learning, so that Q-teaming updates Q-values efficiently. Therefore it saves exploration time to search for the goal state and has better performance than Q-learning. 1 present Speedup Q-learning algorithm to implement the speedup method. This algorithm is evaluated. in a grid-world domain and compared to Q-teaming.

  • PDF

A Study on Application Range of Continuum Model to Discontinuous Rock mass with Numerical Analysis (불연속지반의 연속체 모델 적용범위에 대한 수치해석적 연구)

  • 이경우;노상림;윤지선
    • Proceedings of the Korean Geotechical Society Conference
    • /
    • 2002.03a
    • /
    • pp.197-204
    • /
    • 2002
  • In this study, multivariate analysis based on domestic data(958 EA) of road tunnel, and suggest the easy prediction equation of Q-system. We generate applicable Q-value to numerical analysis method with using the equation and investigate the behavior as variable Q-value of rock mass induced excavation with discontinuum numerical analysis method, UDEC. In the result of the experiment, we research the application range of Q-value to apply the continuum model to discontinuous rock mass is below 0.7 and we testify the applicability of continuum model as researched Q-value with continuum numerical analysis method, FLAC.

  • PDF

Region-based Q-learning for intelligent robot systems (지능형 로보트 시스템을 위한 영역기반 Q-learning)

  • Kim, Jae-Hyeon;Seo, Il-Hong
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.3 no.4
    • /
    • pp.350-356
    • /
    • 1997
  • It is desirable for autonomous robot systems to possess the ability to behave in a smooth and continuous fashion when interacting with an unknown environment. Although Q-learning requires a lot of memory and time to optimize a series of actions in a continuous state space, it may not be easy to apply the method to such a real environment. In this paper, for continuous state space applications, to solve problem and a triangular type Q-value model\ulcorner This sounds very ackward. What is it you want to solve about the Q-value model. Our learning method can estimate a current Q-value by its relationship with the neighboring states and has the ability to learn its actions similar to that of Q-learning. Thus, our method can enable robots to move smoothly in a real environment. To show the validity of our method, navigation comparison with Q-learning are given and visual tracking simulation results involving an 2-DOF SCARA robot are also presented.

  • PDF

Solving Survival Gridworld Problem Using Hybrid Policy Modified Q-Based Reinforcement

  • Montero, Vince Jebryl;Jung, Woo-Young;Jeong, Yong-Jin
    • Journal of IKEEE
    • /
    • v.23 no.4
    • /
    • pp.1150-1156
    • /
    • 2019
  • This paper explores a model-free value-based approach for solving survival gridworld problem. Survival gridworld problem opens up a challenge involving taking risks to gain better rewards. Classic value-based approach in model-free reinforcement learning assumes minimal risk decisions. The proposed method involves a hybrid on-policy and off-policy updates to experience roll-outs using a modified Q-based update equation that introduces a parametric linear rectifier and motivational discount. The significance of this approach is it allows model-free training of agents that take into account risk factors and motivated exploration to gain better path decisions. Experimentations suggest that the proposed method achieved better exploration and path selection resulting to higher episode scores than classic off-policy and on-policy Q-based updates.

Reliability analysis of piles based on proof vertical static load test

  • Dong, Xiaole;Tan, Xiaohui;Lin, Xin;Zhang, Xuejuan;Hou, Xiaoliang;Wu, Daoxiang
    • Geomechanics and Engineering
    • /
    • v.29 no.5
    • /
    • pp.487-496
    • /
    • 2022
  • Most of the pile's vertical static load tests in construction sites are the proof load tests, which is difficult to accurately estimate the ultimate bearing capacity and analyze the reliability of piles. Therefore, a reliability analysis method based on the proof load-settlement (Q-s) data is proposed in this study. In this proposed method, a simple ultimate limit state function based on the hyperbolic model is established, where the random variables of reliability analysis include the model factor of the ultimate bearing capacity and the fitting parameters of the hyperbolic model. The model factor M = RuR / RuP is calculated based on the available destructive Q-s data, where the real value of the ultimate bearing capacity (RuR) is obtained by the complete destructive Q-s data; the predicted value of the ultimate bearing capacity (RuP) is obtained by the proof Q-s data, a part of the available destructive Q-s data, that before the predetermined load determined by the pile test report. The results demonstrate that the proposed method can easy and effectively perform the reliability analysis based on the proof Q-s data.

Protein Adsorption on Ion Exchange Resin: Estimation of Equilibrium Isotherm Parameters from Batch Kinetic Data

  • Chu K.H.;Hashim M.A.
    • Biotechnology and Bioprocess Engineering:BBE
    • /
    • v.11 no.1
    • /
    • pp.61-66
    • /
    • 2006
  • The simple Langmuir isotherm is frequently employed to describe the equilibrium behavior of protein adsorption on a wide variety of adsorbents. The two adjustable parameters of the Langmuir isotherm - the saturation capacity, or $q_m$, and the dissociation constant, $K_d$ - are usually estimated by fitting the isotherm equation to the equilibrium data acquired from batch equilibration experiments. In this study, we have evaluated the possibility of estimating $q_m$ and $K_d$ for the adsorption of bovine serum albumin to a cation exchanger using batch kinetic data. A rate model predicated on the kinetic form of the Langmuir isotherm, with three adjustable parameters ($q_m,\;K_d$, and a rate constant), was fitted to a single kinetic profile. The value of $q_m$ determined as the result of this approach was quantitatively consistent with the $q_m$ value derived from the traditional batch equilibrium data. However, the $K_d$ value could not be retrieved from the kinetic profile, as the model fit proved insensitive to this parameter. Sensitivity analysis provided significant insight into the identifiability of the three model parameters.

Biosorption of Lead $(Pb^{2+})$ from Aqueous Solution by Rhodotorula aurantiaca

  • Cho, Dae-Haeng;Yoo, Man-Hyong;Kim, Eui-Yong
    • Journal of Microbiology and Biotechnology
    • /
    • v.14 no.2
    • /
    • pp.250-255
    • /
    • 2004
  • The aim of this work was to investigate the adsorption isotherm and kinetic model for the biosorption of lead $(Pb^{2+})$ by Rhodotorula aurantiaca and to examine the environmental factors for this metal removal. Within five minutes of contact, $Pb^{2+}$ sorption reached nearly 86% of the total $Pb^{2+}$ sorption. The optimum initial pH value for removal of $Pb^{2+}$ was 5.0. The percentage sorption increased steeply with the biomass concentration up to 2 g/l and thereafter remained more or less constant. The Langmuir sorption model provided a good fit throughout the concentration range. The conformity of these data to the Langmuir model indicated that biosorption of $Pb^{2+}$ by R. aurantiaca could be characterized as a monolayer, single-site type phenomenon with no interaction between ions adsorbed in neighboring sites. The maximum $Pb^{2+}$ sorption capacity $(q_{max})$ and Langmuir constant (b) were 46.08 mg/g of biomass and 0.04 l/mg, respectively. The pseudo second-order equation was well fitted to the experimental data. The correlation coefficients for the linear plots of t/q against t for the second-order equation were 0.999 for all the initial concentrations of biosorbent for contact times of 180 min. The theoretical $q_{eq}$ value was very close to the experimental $q_{eq}$ value.

A Study on the Lateral Flow in Polluted Soft Soils (오염된 연약지반의 측방유동에 관한 연구)

  • 안종필;박상범
    • The Journal of Engineering Geology
    • /
    • v.11 no.2
    • /
    • pp.175-190
    • /
    • 2001
  • This study investigates the existing theoretical backgrounds in order to examine the behavior of lateral flow according to the plasticity of soils when unsymmetrical surcharge is worked on polluted soft soils by comparing and analyzing the results measured through model tests. Model tests are canied out as follows soil tank, bearing frame and bearing plate are made. By increasing unsymmetrical surcharge to the ground soils with the consistent water content and with gradually increased polluted materials at intervals, the amounts of settlement, lateral displacement and upheaval were respectively observed. In conclusion, the value of critical surcharge was expressed as q$_{cr}$=2.78$_{cu}$ which was similar to those Tschebotarioff(q$_{cr}$=3.0$_{cu}$) and Meyerhof(q$_{cr}$=(B/2H+$\pi$/2)$_{cu}$) had been proposed. The value of ultimate capacity was expressed as q$_{ult}$=4.84$_{cu}$ which was similar to that of Prandtl. The lateral flow pressure is adeQuately calculated by the eQuation(P$_{max}$=K$_o$ r H) and the maximum value of lateral flow pressure is found near O.3H of layer thickness(H) and is higher to ground surface than the ones in composition pattern, Poulos distribution pattern and softclay soils (CL, CH) which is not polluted. The stability control method used in this research followed the management diagram of Tominaga.Hashimoto, Shibata.Sekiguchi, Matsuo.Kawamura who use the amounts of plasticity displacement by lateral flow. As a result, the ultimate capacity values in the diagram {S$_v$-(Y$_m$/S$_v$)} of Matsuo.Kawamura and in the diagram {(q/Y$_m$)-q} of Shibata. Sekiguchi were smaller than in the ones of load-settlement curve (q-S$_v$).

  • PDF

Effective Advertising Direction in the post-COVID-19 Era (포스트 코로나 시대의 효과적인 광고 방향에 관한 연구)

  • Lee, Jei-Young;Zheng, Zhao
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.7
    • /
    • pp.89-101
    • /
    • 2022
  • COVID-19 is significantly changing consumers' demand and habits. In order to understand consumer characteristics and find effective advertising directions in the post-COVID-19 era, this study set young consumers who are more sensitive to market changes and technological transformation from a subjective perspective of advertising audiences. Through the Q methodology, the advertising development model in the post-COVID-19 era was derived exploratively by examining their cognitive status of advertisements in the post-COVID-19 era. The model consists of three types of advertisements: "demand mining online ads" that value consumer demand and adapt to online shopping paths, "added value creation experiential ads" that value derived value and consumer experiences, and "practical and sentimental value creative ads" based on pragmatism and emotional values. In addition, this study also suggested for the sustainable practice of advertising in the post-COVID-19 era in various aspects, such as "seeking multidimensional values," "expanding consumer experience," and "mining and leading demand.

The Impact of IFRS Adoption on Firm Value in Korea and China - Evidence using Tobin's Q (국제회계기준 도입이 기업가치에 영향을 미치는가?: 토빈의 Q 모형을 이용한 한국과 중국의 실증비교연구)

  • Jang, Ji-Kyung
    • The Journal of the Korea Contents Association
    • /
    • v.14 no.7
    • /
    • pp.427-434
    • /
    • 2014
  • In this research, it is empirically tested whether firm value after the adoption of IFRS is increased in Korea and China using Tobin's Q model. In Korea, IFRS was mandatorily adopted in 2011 for all companies. China mandated IFRS conversion for public traded companies starting 2007. The revisions bring Chinese standards closer to the IFRS benchmark of internationally recognized quality, but the new standards will not be word-for-word translations of IFRS, though they founded on similar principle. We expect the different adoption process between Korea and China can make different impact of IFRS on firm value. The results are summarized as follows. First, Tobin's Q seems to be increased after the adoption of IFRS, and the firm value is significantly different between before and after IFRS adoption in Korea. Second, Tobin's Q seems to be increased after the adoption of IFRS, but the analysis by t-test is not significantly higher for post IFRS. These results could be a good finding in that the impact of IFRS adoption on firm value is different by adoption process.