Search | Korea Science

A Study about Efficient Method for Training the Reward Model in RLHF (인간 피드백 기반 강화학습 (RLHF)에서 보상 모델의 효과적인 훈련 방법에 관한 연구)

Jeongwook Kim;Imatitikua Danielle Aiyanyo;Heuiseok Lim
- Annual Conference on Human and Language Technology
- /
- 2023.10a
- /
- pp.245-250
- /
- 2023
RLHF(Reinforcement Learning from Human Feedback, 인간 피드백 기반 강화학습) 방법론이 최근 고성능 언어 모델에 많이 적용되고 있다. 이 방법은 보상 모델과 사람의 피드백을 활용하여 언어 모델로 하여금 사람이 선호할 가능성이 높은 응답을 생성하도록 한다. 하지만 상업용 언어 모델에 적용된 RLHF의 경우 구현 방법에 대하여 정확히 밝히고 있지 않다. 특히 강화학습에서 환경(environment)을 담당하는 보상 모델을 어떻게 설정하는지가 가장 중요하지만 그 부분에 대하여 오픈소스 모델들의 구현은 각각 다른 실정이다. 본 연구에서는 보상 모델을 훈련하는 큰 두 가지 갈래인 '순위 기반 훈련 방법'과 '분류 기반 훈련 방법'에 대하여 어떤 방법이 더 효율적인지 실험한다. 또한 실험 결과 분석을 근거로 효율성의 차이가 나는 이유에 대하여 추정한다.
PDF

Load compensation and Speed Controller for Hydraulic Inverter-fed Elevator (유압 인버터 엘리베이터를 위한 부하 보상 및 속도 제어기)

Han, Sang-Soo
- Journal of the Institute of Electronics and Information Engineers
- /
- v.51 no.2
- /
- pp.163-167
- /
- 2014
To prove the vibration and speed error problems caused by the nonlinear friction characteristics and load variation of the hydraulic system, a PID speed controller and a load compensation controller for the hydraulic inverter-fed elevator are proposed. The load compensation controller is composed by the PI controller and the speed controller is composed by the PID controller. The P,I and D gains of the control parameters are obtained by the frequency response of system transfer function. The Effectiveness of the proposed controller are shown by experimental results, which the proposed controller yields robustness with load variations and stable and good speed and acceleration responses with less oscillations.
https://doi.org/10.5573/ieie.2014.51.2.163 인용 PDF KSCI

Digital Control of Bidirectional Charger/Discharger for Dynamic Voltage Restorer System (동적전압보상장치를 위한 양방향 충/방전 시스템의 디지털 제어)

Lee, Jung-Im;Lee, Jong-Hyun;Jung, An-Yoel;Lee, Choon-Ho;Park, Joung-Hu;Jeon, Hee-Jong
- Proceedings of the KIPE Conference
- /
- 2009.11a
- /
- pp.75-77
- /
- 2009
양방향 DC-DC컨버터는 일반적으로 아날로그방식이 사용되고 있다. 그러나 충 방전기로 사용하게 되면 모드전환 시 과도 응답특성이 좋지 않은 문제가 있다. 이에 대한 대안으로 디지털 제어기를 사용하게 되면 진보된 알고리즘들을 이용하여 시스템의 성능을 향상 시킬 수 있다. 본 본문에서는 전기이중층 콘덴서(EDLC)를 이용한 동적전압보상장치(DVR)의 양방향 충/방전 시스템을, Buck/Boost 양방향 컨버터를 이용하여 구현하고, DSP(TI사 TMS320F28335)를 이용한 디지털 제어기를 적용하였다. 모의실험 및 하드웨어를 구성하여 실험한 결과, 양방향 시스템의 과도응답특성 개선과 시스템 성능 향상을 보여준다.
PDF

A Study on a Model Predictive Control to Improve the Imbalace of AC Electric Railway Power (교류 전기철도 전원의 불평형률 향상을 위한 모델예측기법 연구)

Lee, Junghyun;Jo, Jongmin;Shin, Changhoon;Cha, Hanju
- Proceedings of the KIPE Conference
- /
- 2020.08a
- /
- pp.175-177
- /
- 2020
본 논문은 부하의 유동성이 큰 철도 시스템의 특성으로 발생하는 전력 불평형을 개선하기 위해 전력보상장치의 전력품질 및 안정도 향상을 위한 기법을 제안하였다. 철도 부하의 경우 3상의 전력을 공급받아 스코트 변압기를 통해 2개의 단상 선로 M, T상에 공급해주는 형식으로 이때 2개의 단상 측에서 서로 다른 부하가 발생할 경우 3상측에서 불평형이 발생한다. 스위칭 과정에서 발생하는 전력손실 감소를 위해 600Hz의 낮은 스위치 주파수를 이용하며, 전력품질 및 안정도 향상을 위해 12kHz의 샘플링 주파수를 이용하여 샘플링과 제어간의 오차를 감소시켰으며, 빠른 응답성을 갖는 모델예측제어를 제안하였다. 위와 같은 내용을 실험을 통해 전력보상장치의 전류 불평형률을 4.46%까지 감소시켰으며, 불평형을 60Hz 한주기 내에 해결하는 빠른 응답성을 검증하였다.
PDF

New Discrete-time Small Signal Model of Average Current Mode Control for Current Response Prediction (평균전류모드제어의 전류응답예측을 위한 새로운 이산시간 소신호 모델)

Jung Young-Seok
- The Transactions of the Korean Institute of Power Electronics
- /
- v.10 no.3
- /
- pp.219-225
- /
- 2005
In this paper, a new discrete-time small signal model of an average current mode control is proposed to predict the inductor current responses. Compared to the peak current mode control, the analysis of the average current mode control is difficult because of its presence of an compensation network. By utilizing sampler model, a new discrete-time small signal model is derived and used to predict the behaviors of an inductor current of average current mode control employing generalized compensation networks. In order to show the usefulness of the proposed model, prediction results of the proposed model are compared to those of the circuit level simulator, PSIM and experiment.
PDF KSCI

High Accurate Creep Compensation of the Loadcell using the Strain Gauge (스트레인 게이지식 로드셀의 고정밀 크립보상)

Seo, Hae-Jun;Jung, Haing-Sup;Ryu, Gi-Ju;Cho, Tae-Won
- Journal of IKEEE
- /
- v.16 no.1
- /
- pp.34-44
- /
- 2012
This paper proposes a practical compensation method by using digital signal processing over the creep error which is representative in strain gauge loadcell. The signal compensation method carry out the simulation by deciding compensation constant (time constant) and coefficient measuring the loadcell output response. Then, compensation constant and coefficient are stored on the microprocessor. By using calculated on microprocessor creep error compensation values, weighting value is showed as a digital signal by reducing error values measured through output signals of loadcell. In addition, we apply error compensation method in order to have a dedicated software for loadcell electronic scale. This technique is useful because it has great influence on error rate reduction that has been produced by conventional electronic scales (0.03%). As a result our technique gives better accuracy (0.01%~0.003%) as what is given by digital electronic scale, while it has less complex operation processing.
https://doi.org/10.7471/ikeee.2012.16.1.034 인용 PDF KSCI

Design of LQR Controller of DSIATCOM for Compensating Voltage Sag Using PSCAD/EMTDC (PSCAD/EMTDC를 이용한 전압 Sag 보상을 위한 배전용 정지형 보상기의 LQR 제어기 설계)

이명언;정수영;최규하
- Journal of Energy Engineering
- /
- v.13 no.1
- /
- pp.68-74
- /
- 2004
This paper presents the design of DSTATCOM (Distribution Static Synchronous Compensator) controller. The results are verified by using PSCAD/EMTDC package. The state equation derived by decomposition analysis of DSTATCOM current component is applied to load model and the combined model which considered constraint condition. In case of single line to ground fault, the conventional method of Pl control is compared with LQR control technique. LQR control is shown to be superior in terms of response profile and composition of voltage sag.
PDF KSCI

능동형 진동 절연을 위한 압전 구동기의 보상기 설계

문준희;박희재
- Proceedings of the Korean Society of Precision Engineering Conference
- /
- 2004.05a
- /
- pp.198-198
- /
- 2004
압전 구동기는 여러 가지 적용에 있어서 높은 응답 속도와 큰 힘, 작은 크기 둥의 장점을 가지고 미세 구동에서 독보적인 위치를 차지하고 있다. 하지만, 히스테리시스, 크？ 등의 압전 소자 자체의 비선형성과 이의 구동을 위한 증폭기 등의 한계로 압전 소자의 동적인 특성은 비교적 열악한 것으로 알려져 있다. 특히, 구동 속도가 빨라질 수록 히스테리시스 곡선의 모양이 달라게 되어 구동 궤적의 정확한 예측이 어려우며, 증폭기의 최대 발생 전류가 충분치 않을 경우 압전 구동기가 지령치를 따르지 못하게 된다.(중략)
PDF

서비스 조직 구성원의 통제지각과 행동통제과정: 피드백, 비금전적 보상 및 역기능간의 관계를 중심으로

김재영;한동철;안승호
- Asia Marketing Journal
- /
- v.1 no.3
- /
- pp.109-119
- /
- 1999
서비스마케팅에 있어서 종업원의 행동통제와 결과통제에 대한 중요성이 증대하고 있다. 그 중에서 행동통제를 보다 효율적으로하기 위하여 많은 연구가 진행되고 있다. 본 연구는 마케팅조직 종업원의 행동통제 지각이 조직내 종업원이 행동에 어떠한 영향을 주는지를 조사하였다. 종업원이 느끼는 통제지각, 상사의 피드백, 비금전적 보상 그리고 종업원의 역기능 행동 간의 다섯가지 관계에 대하여 가설이 설정되었다. 병원간호사 120 명의 설문응답에 근거하여서 가설검증을 하였다. 다섯 개의 가설중 세 개는 지지되었고, 두 개의 가설은 현재의 자료로는 지지되지 않았다. 가설검증 결과에 근거하여서 시사점과 결론이 제시되었다.
PDF

The Study of Energy Compensation Filter Thickness for Each Energy Area of Low Energy X-ray Beam Optimization on Active Electronic Personal Dosimeter (능동형 전자식 개인피폭선량계의 저에너지 X선 영역별 최적화를 위한 에너지보상 필터 두께에 대한 연구)

Kim, Jung-Su;Park, Youn-Hyun;Chae, Hyun-Sic
- Journal of the Korean Society of Radiology
- /
- v.16 no.5
- /
- pp.519-526
- /
- 2022
Electronic personal dosimeter (EPD) provide real time monitoring and a direct indication of the accumulated dose or dose rate in terms of personal dose. Most EPD do not perform well in low energy photon radiation fields present in medical radiation environments. It has poor responsibility and large error rate for low energy photon radiation of medical radiation environments. This study evaluated to optimal additional filtration for EPD using silicon PIN photodiode detector form 40 to 120 kVp range in medical radiation environments. From 40 to 80 kVp energy range, Al 0.2 mm and Sn 1.0 mm overlapped filtration showed good responsibility to dose rate and from 80 kVp to 120 kVp energy range, Al 0.2 mm and Sn 1.6 mm overlapped filtration showed good responsibility to dose rate.
https://doi.org/10.7742/jksr.2022.16.5.519 인용 PDF KSCI HTML

Search Result 282, Processing Time 0.03 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)