• Title/Summary/Keyword: Imputation method

Search Result 134, Processing Time 0.025 seconds

Household, personal, and financial determinants of surrender in Korean health insurance

  • Shim, Hyunoo;Min, Jung Yeun;Choi, Yang Ho
    • Communications for Statistical Applications and Methods
    • /
    • v.28 no.5
    • /
    • pp.447-462
    • /
    • 2021
  • In insurance, the surrender rate is an important variable that threatens the sustainability of insurers and determines the profitability of the contract. Unlike other actuarial assumptions that determine the cash flow of an insurance contract, however, it is characterized by endogenous variables such as people's economic, social, and subjective decisions. Therefore, a microscopic approach is required to identify and analyze the factors that determine the lapse rate. Specifically, micro-level characteristics including the individual, demographic, microeconomic, and household characteristics of policyholders are necessary for the analysis. In this study, we select panel survey data of Korean Retirement Income Study (KReIS) with many diverse dimensions to determine which variables have a decisive effect on the lapse and apply the lasso regularized regression model to analyze it empirically. As the data contain many missing values, they are imputed using the random forest method. Among the household variables, we find that the non-existence of old dependents, the existence of young dependents, and employed family members increase the surrender rate. Among the individual variables, divorce, non-urban residential areas, apartment type of housing, non-ownership of homes, and bad relationship with siblings increase the lapse rate. Finally, among the financial variables, low income, low expenditure, the existence of children that incur child care expenditure, not expecting to bequest from spouse, not holding public health insurance, and expecting to benefit from a retirement pension increase the lapse rate. Some of these findings are consistent with those in the literature.

A genome-wide association study on growth traits of Korean commercial pig breeds using Bayesian methods

  • Jong Hyun Jung;Sang Min Lee;Sang-Hyon Oh
    • Animal Bioscience
    • /
    • v.37 no.5
    • /
    • pp.807-816
    • /
    • 2024
  • Objective: This study aims to identify the significant regions and candidate genes of growth-related traits (adjusted backfat thickness [ABF], average daily gain [ADG], and days to 90 kg [DAYS90]) in Korean commercial GGP pig (Duroc, Landrace, and Yorkshire) populations. Methods: A genome-wide association study (GWAS) was performed using single-nucleotide polymorphism (SNP) markers for imputation to Illumina PorcineSNP60. The BayesB method was applied to calculate thresholds for the significance of SNP markers. The identified windows were considered significant if they explained ≥1% genetic variance. Results: A total of 28 window regions were related to genetic growth effects. Bayesian GWAS revealed 28 significant genetic regions including 52 informative SNPs associated with growth traits (ABF, ADG, DAYS90) in Duroc, Landrace, and Yorkshire pigs, with genetic variance ranging from 1.00% to 5.46%. Additionally, 14 candidate genes with previous functional validation were identified for these traits. Conclusion: The identified SNPs within these regions hold potential value for future marker-assisted or genomic selection in pig breeding programs. Consequently, they contribute to an improved understanding of genetic architecture and our ability to genetically enhance pigs. SNPs within the identified regions could prove valuable for future marker-assisted or genomic selection in pig breeding programs.

A Study on the Optimal Cut-off Point in the Cut-off Sampling Method (절사표본에서 최적 절사점에 관한 연구)

  • Lee, Sang Eun;Cho, Min Ji;Shin, Key-Il
    • The Korean Journal of Applied Statistics
    • /
    • v.27 no.3
    • /
    • pp.501-512
    • /
    • 2014
  • Modified cut-off sampling is widely used for highly skewed data. A serious drawback of modified cut-off sampling is the difficulty of adjustment of non-response in take-all stratum. Therefore, solutions of the problems of non-response in take-all stratum have been studied in various ways such as substitute of samples, imputation or re-weight method. In this paper, a new cut-off point based on minimizing MSE being used in exponential and power functions is suggested and it can be reduced the number of take-all stratum. We also investigate another cut-off point determination method with underlying distributions such as truncated log-normal and truncated gamma distributions. Finally we suggest the optimal cut-off point which has a minimum of take-all stratum size among suggested methods. Simulation studies are performed and Labor Survey data and simulated data are used for the case study.

Variational Mode Decomposition with Missing Data (결측치가 있는 자료에서의 변동모드분해법)

  • Choi, Guebin;Oh, Hee-Seok;Lee, Youngjo;Kim, Donghoh;Yu, Kyungsang
    • The Korean Journal of Applied Statistics
    • /
    • v.28 no.2
    • /
    • pp.159-174
    • /
    • 2015
  • Dragomiretskiy and Zosso (2014) developed a new decomposition method, termed variational mode decomposition (VMD), which is efficient for handling the tone detection and separation of signals. However, VMD may be inefficient in the presence of missing data since it is based on a fast Fourier transform (FFT) algorithm. To overcome this problem, we propose a new approach based on a novel combination of VMD and hierarchical (or h)-likelihood method. The h-likelihood provides an effective imputation methodology for missing data when VMD decomposes the signal into several meaningful modes. A simulation study and real data analysis demonstrates that the proposed method can produce substantially effective results.

A Study for Traffic Forecasting Using Traffic Statistic Information (교통 통계 정보를 이용한 속도 패턴 예측에 관한 연구)

  • Choi, Bo-Seung;Kang, Hyun-Cheol;Lee, Seong-Keon;Han, Sang-Tae
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.6
    • /
    • pp.1177-1190
    • /
    • 2009
  • The traffic operating speed is one of important information to measure a road capacity. When we supply the information of the road of high traffic by using navigation, offering the present traffic information and the forecasted future information are the outstanding functions to serve the more accurate expected times and intervals. In this study, we proposed the traffic speed forecasting model using the accumulated traffic speed data of the road and highway and forecasted the average speed for each the road and high interval and each time interval using Fourier transformation and time series regression model with trigonometrical function. We also propose the proper method of missing data imputation and treatment for the outliers to raise an accuracy of the traffic speed forecasting and the speed grouping method for which data have similar traffic speed pattern to increase an efficiency of analysis.

Probability Estimation Method for Imputing Missing Values in Data Expansion Technique (데이터 확장 기법에서 손실값을 대치하는 확률 추정 방법)

  • Lee, Jong Chan
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.11
    • /
    • pp.91-97
    • /
    • 2021
  • This paper uses a data extension technique originally designed for the rule refinement problem to handling incomplete data. This technique is characterized in that each event can have a weight indicating importance, and each variable can be expressed as a probability value. Since the key problem in this paper is to find the probability that is closest to the missing value and replace the missing value with the probability, three different algorithms are used to find the probability for the missing value and then store it in this data structure format. And, after learning to classify each information area with the SVM classification algorithm for evaluation of each probability structure, it compares with the original information and measures how much they match each other. The three algorithms for the imputation probability of the missing value use the same data structure, but have different characteristics in the approach method, so it is expected that it can be used for various purposes depending on the application field.

Personalized Data Restoration Algorithm to Improve Wearable Device Service (웨어러블 디바이스 서비스 향상을 위한 개인 맞춤형 데이터 복원 알고리즘)

  • Kikun Park;Hye-Rim Bae
    • The Journal of Bigdata
    • /
    • v.6 no.2
    • /
    • pp.51-60
    • /
    • 2021
  • The market size of wearable devices is growing rapidly every year, and manufacturers around the world are introducing products that utilize their unique characteristics to keep up with the demand. Among them, smart watches are wearable devices with a very high share in sales, and they provide a variety of services to users by using information collected in real-time. The quality of service depends on the accuracy of the data collected by the smart watch, but data measurement may not be possible depending on the situation. This paper introduces a method to restore data that a smart watch could not collect. It deals with the similarity calculation method of trajectory information measured over time for data restoration and introduces a procedure for restoring missing sections according to the similarity. To prove the performance of the proposed methodology, a comparative experiment with a machine learning algorithm was conducted. Finally, the expected effects of this study and future research directions are discussed.

Development of Truck Axle Load Estimation Model Using Weigh-In-Motion Data (WIM 자료를 활용한 화물차량의 축중량 추정 모형 개발에 관한 연구)

  • Oh, Ju Sam
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.31 no.4D
    • /
    • pp.511-518
    • /
    • 2011
  • Truck weight data are essential for road infrastructure design, maintenance and management. WIM (Weigh-In-Motion) system provides highway planners, researchers and officials with statistical data. Recently high speed WIM data also uses to support a vehicle weight regulation and enforcement activities. This paper aims at developing axle load estimating models with high speed WIM data collected from national highway. We also suggest a method to estimate axle load using simple regression model for WIM system. The model proposed by this paper, resulted in better axle load estimation in all class of vehicle than conventional model. The developed axle load estimating model will used for on-going or re-calibration procedures to ensure an adequate level of WIM system performance. This model can also be used for missing axle load data imputation in the future.

Forecasting the Demand Areas of a Factory Site: Based on a Statistical Model and Sampling Survey (공장용지 수요 추정 모형 개발 및 수요예측)

  • Jeong, Hyeong-Chul;Han, Geun-Shik;Kim, Seong-Yong
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.3
    • /
    • pp.465-475
    • /
    • 2011
  • In this paper, we have considered the problems of the estimation of the gross areas of a factory site relating to the areas of industrial complex lands based on a statistical forecasting model and the results of a sampling survey. In respect to the data of a gross areas of a factory site, we have only the sizes from 1981-2003. In 2009, the Korea Industrial Complex Corp. conducted a sampling survey to estimate its bulk size, and investigate the demands of its sizes for the next five years. In this study, we have adopted the sampling survey results, and have created a statistical growth model for the gross areas of a factory site to improve the prediction for the areas of a factory site. The three-different parts of data: the results of areas of a factory site by Korea National Statistical Office, imputation results by the statistical forecasting model, and sampling survey results have used as the basis for analysis. The combination of the three-different parts of data has created a new forecasting value of the areas of a factory site through the spline smoothing method.

Store Sales Prediction Using Gradient Boosting Model (그래디언트 부스팅 모델을 활용한 상점 매출 예측)

  • Choi, Jaeyoung;Yang, Heeyoon;Oh, Hayoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.2
    • /
    • pp.171-177
    • /
    • 2021
  • Through the rapid developments in machine learning, there have been diverse utilization approaches not only in industrial fields but also in daily life. Implementations of machine learning on financial data, also have been of interest. Herein, we employ machine learning algorithms to store sales data and present future applications for fintech enterprises. We utilize diverse missing data processing methods to handle missing data and apply gradient boosting machine learning algorithms; XGBoost, LightGBM, CatBoost to predict the future revenue of individual stores. As a result, we found that using median imputation onto missing data with the appliance of the xgboost algorithm has the best accuracy. By employing the proposed method, fintech enterprises and customers can attain benefits. Stores can benefit by receiving financial assistance beforehand from fintech companies, while these corporations can benefit by offering financial support to these stores with low risk.