• 제목/요약/키워드: random-effect model

검색결과 790건 처리시간 0.027초

상대오차예측을 이용한 자동차 보험의 손해액 예측: 패널자료를 이용한 연구 (Predicting claim size in the auto insurance with relative error: a panel data approach)

  • 박흥선
    • 응용통계연구
    • /
    • 제34권5호
    • /
    • pp.697-710
    • /
    • 2021
  • 상대오차를 이용한 예측법은 상대오차(혹은 퍼센트오차)가 중요시되는 분야, 특히 계량경제학이나 소프트웨어 엔지니어링, 또는 정부기관 공식통계 부분에서 기존 예측방법 외에 선호되는 예측방법이다. 그 동안 상대오차를 이용한 예측법은 선형 혹은 비선형 회귀분석 뿐 아니라, 커널회귀를 이용한 비모수 회귀모형, 그리고 정상시계열분석에 이르기까지 그 범위가 확장되어 왔다. 그러나, 지금까지의 분석은 고정효과(fixed effect)만을 고려한 것이어서 임의효과(random effect)에 관한 상대오차 예측법에 대한 확장이 필요하였다. 본 논문의 목적은 상대오차예측법을 일반화선형혼합모형(GLMM)에 속한 감마회귀(gamma regression), 로그정규회귀(lognormal regression), 그리고 역가우스회귀(inverse gaussian regression)의 패널자료(panel data)에 적용시키는데 있다. 이를 위해 실제 자동차 보험회사의 손해액 자료를 사용하였고, 최량예측량과 최량상대오차예측량을 각각 적용-비교해 보았다.

Evaluation of the equation for predicting dry matter intake of lactating dairy cows in the Korean feeding standards for dairy cattle

  • Lee, Mingyung;Lee, Junsung;Jeon, Seoyoung;Park, Seong-Min;Ki, Kwang-Seok;Seo, Seongwon
    • Animal Bioscience
    • /
    • 제34권10호
    • /
    • pp.1623-1631
    • /
    • 2021
  • Objective: This study aimed to validate and evaluate the dry matter (DM) intake prediction model of the Korean feeding standards for dairy cattle (KFSD). Methods: The KFSD DM intake (DMI) model was developed using a database containing the data from the Journal of Dairy Science from 2006 to 2011 (1,065 observations 287 studies). The development (458 observations from 103 studies) and evaluation databases (168 observations from 74 studies) were constructed from the database. The body weight (kg; BW), metabolic BW (BW0.75, MBW), 4% fat-corrected milk (FCM), forage as a percentage of dietary DM, and the dietary content of nutrients (% DM) were chosen as possible explanatory variables. A random coefficient model with the study as a random variable and a linear model without the random effect was used to select model variables and estimate parameters, respectively, during the model development. The best-fit equation was compared to published equations, and sensitivity analysis of the prediction equation was conducted. The KFSD model was also evaluated using in vivo feeding trial data. Results: The KFSD DMI equation is 4.103 (±2.994)+0.112 (±0.022)×MBW+0.284 (±0.020)×FCM-0.119 (±0.028)×neutral detergent fiber (NDF), explaining 47% of the variation in the evaluation dataset with no mean nor slope bias (p>0.05). The root mean square prediction error was 2.70 kg/d, best among the tested equations. The sensitivity analysis showed that the model is the most sensitive to FCM, followed by MBW and NDF. With the in vivo data, the KFSD equation showed slightly higher precision (R2 = 0.39) than the NRC equation (R2 = 0.37), with a mean bias of 1.19 kg and no slope bias (p>0.05). Conclusion: The KFSD DMI model is suitable for predicting the DMI of lactating dairy cows in practical situations in Korea.

A comparison study between the realistic random modeling and simplified porous medium for gamma-gamma well-logging

  • Fatemeh S. Rasouli
    • Nuclear Engineering and Technology
    • /
    • 제56권5호
    • /
    • pp.1747-1753
    • /
    • 2024
  • The accurate determination of formation density and the physical properties of rocks is the most critical logging tasks which can be obtained using gamma-ray transport and detection tools. Though the simulation works published so far have considerably improved the knowledge of the parameters that govern the responses of the detectors in these tools, recent studies have found considerable differences between the results of using a conventional model of a homogeneous mixture of formation and fluid and an inhomogeneous fractured medium. It has increased concerns about the importance of the complexity of the model used for the medium in simulation works. In the present study, we have suggested two various models for the flow of the fluid in porous media and fractured rock to be used for logging purposes. For a typical gamma-gamma logging tool containing a 137Cs source and two NaI detectors, simulated by using the MCNPX code, a simplified porous (SP) model in which the formation is filled with elongated rectangular cubes loaded with either mineral material or oil was investigated. In this model, the oil directly reaches the top of the medium and the connection between the pores is not guaranteed. In the other model, the medium is a large 3-D matrix of 1 cm3 randomly filled cubes. The designed algorithm to fill the matrix sites is so that this realistic random (RR) model provides the continuum growth of oil flow in various disordered directions and, therefore, fulfills the concerns about modeling the rock textures consist of extremely complex pore structures. For an arbitrary set of oil concentrations and various formation materials, the response of the detectors in the logging tool has been considered as a criterion to assess the effect of modeling for the distribution of pores in the formation on simulation studies. The results show that defining a RR model for describing heterogeneities of a porous medium does not effectively improve the prediction of the responses of logging tools. Taking into account the computational cost of the particle transport in the complex geometries in the Monte Carlo method, the SP model can be satisfactory for gamma-gamma logging purposes.

다중회귀분석에 의한 하천 월 유출량의 추계학적 추정에 관한 연구 (A Study on Stochastic Estimation of Monthly Runoff by Multiple Regression Analysis)

  • 김태철;정하우
    • 한국농공학회지
    • /
    • 제22권3호
    • /
    • pp.75-87
    • /
    • 1980
  • Most hydro]ogic phenomena are the complex and organic products of multiple causations like climatic and hydro-geological factors. A certain significant correlation on the run-off in river basin would be expected and foreseen in advance, and the effect of each these causual and associated factors (independant variables; present-month rainfall, previous-month run-off, evapotranspiration and relative humidity etc.) upon present-month run-off(dependent variable) may be determined by multiple regression analysis. Functions between independant and dependant variables should be treated repeatedly until satisfactory and optimal combination of independant variables can be obtained. Reliability of the estimated function should be tested according to the result of statistical criterion such as analysis of variance, coefficient of determination and significance-test of regression coefficients before first estimated multiple regression model in historical sequence is determined. But some error between observed and estimated run-off is still there. The error arises because the model used is an inadequate description of the system and because the data constituting the record represent only a sample from a population of monthly discharge observation, so that estimates of model parameter will be subject to sampling errors. Since this error which is a deviation from multiple regression plane cannot be explained by first estimated multiple regression equation, it can be considered as a random error governed by law of chance in nature. This unexplained variance by multiple regression equation can be solved by stochastic approach, that is, random error can be stochastically simulated by multiplying random normal variate to standard error of estimate. Finally hybrid model on estimation of monthly run-off in nonhistorical sequence can be determined by combining the determistic component of multiple regression equation and the stochastic component of random errors. Monthly run-off in Naju station in Yong-San river basin is estimated by multiple regression model and hybrid model. And some comparisons between observed and estimated run-off and between multiple regression model and already-existing estimation methods such as Gajiyama formula, tank model and Thomas-Fiering model are done. The results are as follows. (1) The optimal function to estimate monthly run-off in historical sequence is multiple linear regression equation in overall-month unit, that is; Qn=0.788Pn+0.130Qn-1-0.273En-0.1 About 85% of total variance of monthly runoff can be explained by multiple linear regression equation and its coefficient of determination (R2) is 0.843. This means we can estimate monthly runoff in historical sequence highly significantly with short data of observation by above mentioned equation. (2) The optimal function to estimate monthly runoff in nonhistorical sequence is hybrid model combined with multiple linear regression equation in overall-month unit and stochastic component, that is; Qn=0. 788Pn+0. l30Qn-1-0. 273En-0. 10+Sy.t The rest 15% of unexplained variance of monthly runoff can be explained by addition of stochastic process and a bit more reliable results of statistical characteristics of monthly runoff in non-historical sequence are derived. This estimated monthly runoff in non-historical sequence shows up the extraordinary value (maximum, minimum value) which is not appeared in the observed runoff as a random component. (3) "Frequency best fit coefficient" (R2f) of multiple linear regression equation is 0.847 which is the same value as Gaijyama's one. This implies that multiple linear regression equation and Gajiyama formula are theoretically rather reasonable functions.

  • PDF

머신러닝 기반 건강컨설팅 성공여부 예측모형 개발 (Developing a Model for Predicting Success of Machine Learning based Health Consulting)

  • 이상호;송태민
    • 한국IT서비스학회지
    • /
    • 제17권1호
    • /
    • pp.91-103
    • /
    • 2018
  • This study developed a prediction model using machine learning technology and predicted the success of health consulting by using life log data generated through u-Health service. The model index of the Random Forest model was the highest using. As a result of analyzing the Random Forest model, blood pressure was the most influential factor in the success or failure of metabolic syndrome in the subjects of u-Health service, followed by triglycerides, body weight, blood sugar, high cholesterol, and medication appear. muscular, basal metabolic rate and high-density lipoprotein cholesterol were increased; waist circumference, Blood sugar and triglyceride were decreased. Further, biometrics and health behavior improved. After nine months of u-health services, the number of subjects with four or more factors for metabolic syndrome decreased by 28.6%; 3.7% of regular drinkers stopped drinking; 23.2% of subjects who rarely exercised began to exercise twice a week or more; and 20.0% of smokers stopped smoking. If the predictive model developed in this study is linked with CBR, it can be used as case study data of CBR with high probability of success in the prediction model to improve the compliance of the subject and to improve the qualitative effect of counseling for the improvement of the metabolic syndrome.

집단 약동학 모형에 대한 통계학적 고찰 (A Statistical Approach to the Pharmacokinetic Model)

  • 이은경
    • 응용통계연구
    • /
    • 제23권3호
    • /
    • pp.511-520
    • /
    • 2010
  • 약동학 모형은 약동학 모수들의 복잡한 비선형형태의 함수로 복잡한 미분방정식의 형태로 나타나기도 한다. 집단 약동학은 약동학 모형에서 약동학 모수들의 개인 간 차이를 나타내기 위해 이를 랜덤효과로 가정하므로 비선형 혼합 효과 모형이 된다. 본 논문에서는 임상약리학에서 약동학적 특징을 설명하기 위해 사용하는 집단 약동학 모형에 대한 통계학적 고찰을 해 본다. 또한 실제 임상자료를 이용하여 집단 약동학 모형을 적용하여 분석해 봄으로써 통계적 의미를 살펴본다.

Genetic Mixed Effects Models for Twin Survival Data

  • Ha, Il-Do;Noh, Maengseok;Yoon, Sangchul
    • Communications for Statistical Applications and Methods
    • /
    • 제12권3호
    • /
    • pp.759-771
    • /
    • 2005
  • Twin studies are one of the most widely used methods for quantifying the influence of genetic and environmental factors on some traits such as a life span or a disease. In this paper we propose a genetic mixed linear model for twin survival time data, which allows us to separate the genetic component from the environmental component. Inferences are based upon the hierarchical likelihood (h-likelihood), which provides a statistically efficient and simple unified framework for various random-effect models. We also propose a simple and fast computation method for analyzing a large data set on twin survival study. The new method is illustrated to the survival data in Swedish Twin Registry. A simulation study is carried out to evaluate the performance.

Development of Eco-Friendly Ag Embedded Peroxo Titanium Complex Solution Based Thin Film and Electrical Behaviors of Res is tive Random Access Memory

  • Won Jin Kim;Jinho Lee;Ryun Na Kim;Donghee Lee;Woo-Byoung Kim
    • 한국재료학회지
    • /
    • 제34권3호
    • /
    • pp.152-162
    • /
    • 2024
  • In this study, we introduce a novel TiN/Ag embedded TiO2/FTO resistive random-access memory (RRAM) device. This distinctive device was fabricated using an environmentally sustainable, solution-based thin film manufacturing process. Utilizing the peroxo titanium complex (PTC) method, we successfully incorporated Ag precursors into the device architecture, markedly enhancing its performance. This innovative approach effectively mitigates the random filament formation typically observed in RRAM devices, and leverages the seed effect to guide filament growth. As a result, the device demonstrates switching behavior at substantially reduced voltage and current levels, heralding a new era of low-power RRAM operation. The changes occurring within the insulator depending on Ag contents were confirmed by X-ray photoelectron spectroscopy (XPS) analysis. Additionally, we confirmed the correlation between Ag and oxygen vacancies (Vo). The current-voltage (I-V) curves obtained suggest that as the Ag content increases there is a change in the operating mechanism, from the space charge limited conduction (SCLC) model to ionic conduction mechanism. We propose a new filament model based on changes in filament configuration and the change in conduction mechanisms. Further, we propose a novel filament model that encapsulates this shift in conduction behavior. This model illustrates how introducing Ag alters the filament configuration within the device, leading to a more efficient and controlled resistive switching process.

영과잉 토빗모형을 이용한 한국 소득분포 자료의 베이지안 분석 (Bayesian analysis of Korean income data using zero-inflated Tobit model)

  • 황지수;김세완;오만숙
    • 응용통계연구
    • /
    • 제30권6호
    • /
    • pp.917-929
    • /
    • 2017
  • 한국노동패널조사에서 제공하는 2015년 한국 생산가능인구의 월평균 소득분포를 보면 0 관측치의 비율이 과도하게 높은 형태를 보여 기존의 소득분포에 주로 사용되는 토빗모형으로는 설명에 한계가 있다. 본 연구에서는 영과잉 특성을 반영하여 영과잉 토빗모형을 사용하여 한국인의 소득 자료를 분석한다. 영과잉 토빗모형은 2단계 모형으로 1단계에서는 소득이 0인 그룹을 두 그룹으로 나누는데, 첫 번째 그룹은 노동시장 참여의지가 없어 시장에 참여하지 않으므로 0이 관측되는 그룹(genuine zero)이고 두 번째 그룹은 노동시장 참여의지는 있으나 낮은 임금으로 인하여 절단되어 0이 관측되는 그룹(random zero)으로 가정하였다. 두 번째 random zero 그룹은 0 이상의 연속 자료와 결합하여 토빗모형을 적용한다. 1단계와 2단계 모형에 관심 있는 설명변수를 가진 회귀모형을 적용하여 노동시장 참여여부와 임금 수준에 영향을 미치는 요인을 알아본다. 마코브 체인 몬테칼로 기법을 사용하여 모수를 추정하고 기존의 토빗모형과 비교한 결과 영과잉 토빗모형이 0의 빈도추정과 모형 적합도 면에서 우수한 결과를 보였다. 분석결과 나이가 많을수록, 남자가 여자보다, 학력이 낮을수록, 노동시장에 참여할 가능성이 매우 유의하게 높으며, 사회경제적 지위가 높을수록 그리고 유보임금이 낮을수록 노동시장에 참여하지 않을 확률이 높은 것으로 나타났다. 임금수준을 보면, 남자가 여자보다, 학력이 높을수록, 기혼이 미혼 보다 매우 유의하게 더 높은 임금을 받는 것으로 나타났다.

SPATIAL AND TEMPORAL INFLUENCES ON SOIL MOISTURE ESTIMATION

  • Kim, Gwang-seob
    • Water Engineering Research
    • /
    • 제3권1호
    • /
    • pp.31-44
    • /
    • 2002
  • The effect of diurnal cycle, intermittent visit of observation satellite, sensor installation, partial coverage of remote sensing, heterogeneity of soil properties and precipitation to the soil moisture estimation error were analyzed to present the global sampling strategy of soil moisture. Three models, the theoretical soil moisture model, WGR model proposed Waymire of at. (1984) to generate rainfall, and Turning Band Method to generate two dimensional soil porosity, active soil depth and loss coefficient field were used to construct sufficient two-dimensional soil moisture data based on different scenarios. The sampling error is dominated by sampling interval and design scheme. The effect of heterogeneity of soil properties and rainfall to sampling error is smaller than that of temporal gap and spatial gap. Selecting a small sampling interval can dramatically reduce the sampling error generated by other factors such as heterogeneity of rainfall, soil properties, topography, and climatic conditions. If the annual mean of coverage portion is about 90%, the effect of partial coverage to sampling error can be disregarded. The water retention capacity of fields is very important in the sampling error. The smaller the water retention capacity of the field (small soil porosity and thin active soil depth), the greater the sampling error. These results indicate that the sampling error is very sensitive to water retention capacity. Block random installation gets more accurate data than random installation of soil moisture gages. The Walnut Gulch soil moisture data show that the diurnal variation of soil moisture causes sampling error between 1 and 4 % in daily estimation.

  • PDF