• Title/Summary/Keyword: Multiple Imputation

Search Result 61, Processing Time 0.027 seconds

Estimation of Conditional Kendall's Tau for Bivariate Interval Censored Data

  • Kim, Yang-Jin
    • Communications for Statistical Applications and Methods
    • /
    • v.22 no.6
    • /
    • pp.599-604
    • /
    • 2015
  • Kendall's tau statistic has been applied to test an association of bivariate random variables. However, incomplete bivariate data with a truncation and a censoring results in incomparable or unorderable pairs. With such a partial information, Tsai (1990) suggested a conditional tau statistic and a test procedure for a quasi independence that was extended to more diverse cases such as double truncation and a semi-competing risk data. In this paper, we also employed a conditional tau statistic to estimate an association of bivariate interval censored data. The suggested method shows a better result in simulation studies than Betensky and Finkelstein's multiple imputation method except a case in cases with strong associations. The association of incubation time and infection time from an AIDS cohort study is estimated as a real data example.

A nonnormal Bayesian imputation

  • Shin Minwoong;Lee Jinhee;Lee Juyoung;Lee Sangeun
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2000.11a
    • /
    • pp.51-56
    • /
    • 2000
  • When the standard inference is to be used with complete data and nonresponse is ignorable, then multiple imputations should be created as repetitions under a Bayesian normal model. Many Bayesian models besides the normal, however, approximately yield the standard inference with complete data and thus many such models can be used to create proper imputations. We consider the Bayesian bootstrap (BB) application.

  • PDF

A Study of Labor Entry of Conditional Welfare Recipients : An Exploration of the Predictors (취업대상 조건부수급자의 경제적 자활로의 진입에 영향을 미치는 요인에 관한 연구)

  • Kim, Kyo-Seong;Kang, Chul-Hee
    • Korean Journal of Social Welfare
    • /
    • v.52
    • /
    • pp.5-32
    • /
    • 2003
  • This paper examines the labor entry of conditional welfare recipients. This paper focuses on two questions. First, what is the percentage of conditional welfare recipients who have labor entry? Second, what are the predictors in the labor entry and the duration to the entry? Using Data about 917 welfare recipients who participated in the self-sufficiency programs of the Offices for Secure Employment in Seoul, this paper attempts to answer the above questions. Logistic regression analysis and survival analysis are adopted to identify variables predicting labor entry of conditional welfare recipients. This paper also utilizes a multiple imputation method to deal with the limitation of data by the missing values in some variables. The major findings are as follows: about 43.8% of the conditional welfare recipients have successful labor entry; and in the labor entry and the duration to the entry, gender, household, information and referral services for employment, health and willingness for self-sufficiency are the predictors that are statistically significant. Among these variables, health and willingness for self-sufficiency are more noticeable; it is recognized that programs to care for health of welfare recipients who want to have the labor entry and counseling programs to strengthen welfare recipients' willingness for labor entry are very important for them to be successful in the labor entry. This paper provides a basic knowledge about realities of the conditional welfare recipients' labor entry, identifies research areas for further research, and develops policy implications for their self-sufficiency.

  • PDF

Exploiting Patterns for Handling Incomplete Coevolving EEG Time Series

  • Thi, Ngoc Anh Nguyen;Yang, Hyung-Jeong;Kim, Sun-Hee
    • International Journal of Contents
    • /
    • v.9 no.4
    • /
    • pp.1-10
    • /
    • 2013
  • The electroencephalogram (EEG) time series is a measure of electrical activity received from multiple electrodes placed on the scalp of a human brain. It provides a direct measurement for characterizing the dynamic aspects of brain activities. These EEG signals are formed from a series of spatial and temporal data with multiple dimensions. Missing data could occur due to fault electrodes. These missing data can cause distortion, repudiation, and further, reduce the effectiveness of analyzing algorithms. Current methodologies for EEG analysis require a complete set of EEG data matrix as input. Therefore, an accurate and reliable imputation approach for missing values is necessary to avoid incomplete data sets for analyses and further improve the usage of performance techniques. This research proposes a new method to automatically recover random consecutive missing data from real world EEG data based on Linear Dynamical System. The proposed method aims to capture the optimal patterns based on two main characteristics in the coevolving EEG time series: namely, (i) dynamics via discovering temporal evolving behaviors, and (ii) correlations by identifying the relationships between multiple brain signals. From these exploits, the proposed method successfully identifies a few hidden variables and discovers their dynamics to impute missing values. The proposed method offers a robust and scalable approach with linear computation time over the size of sequences. A comparative study has been performed to assess the effectiveness of the proposed method against interpolation and missing values via Singular Value Decomposition (MSVD). The experimental simulations demonstrate that the proposed method provides better reconstruction performance up to 49% and 67% improvements over MSVD and interpolation approaches, respectively.

Synthetic data generation by probabilistic PCA (주성분 분석을 활용한 재현자료 생성)

  • Min-Jeong Park
    • The Korean Journal of Applied Statistics
    • /
    • v.36 no.4
    • /
    • pp.279-294
    • /
    • 2023
  • It is well known to generate synthetic data sets by the sequential regression multiple imputation (SRMI) method. The R-package synthpop are widely used for generating synthetic data by the SRMI approaches. In this paper, I suggest generating synthetic data based on the probabilistic principal component analysis (PPCA) method. Two simple data sets are used for a simulation study to compare the SRMI and PPCA approaches. Simulation results demonstrate that pairwise coefficients in synthetic data sets by PPCA can be closer to original ones than by SRMI. Furthermore, for the various data types that PPCA applications are well established, such as time series data, the PPCA approach can be extended to generate synthetic data sets.

Performance Comparison of Two Gene Set Analysis Methods for Genome-wide Association Study Results: GSA-SNP vs i-GSEA4GWAS

  • Kwon, Ji-Sun;Kim, Ji-Hye;Nam, Doug-U;Kim, Sang-Soo
    • Genomics & Informatics
    • /
    • v.10 no.2
    • /
    • pp.123-127
    • /
    • 2012
  • Gene set analysis (GSA) is useful in interpreting a genome-wide association study (GWAS) result in terms of biological mechanism. We compared the performance of two different GSA implementations that accept GWAS p-values of single nucleotide polymorphisms (SNPs) or gene-by-gene summaries thereof, GSA-SNP and i-GSEA4GWAS, under the same settings of inputs and parameters. GSA runs were made with two sets of p-values from a Korean type 2 diabetes mellitus GWAS study: 259,188 and 1,152,947 SNPs of the original and imputed genotype datasets, respectively. When Gene Ontology terms were used as gene sets, i-GSEA4GWAS produced 283 and 1,070 hits for the unimputed and imputed datasets, respectively. On the other hand, GSA-SNP reported 94 and 38 hits, respectively, for both datasets. Similar, but to a lesser degree, trends were observed with Kyoto Encyclopedia of Genes and Genomes (KEGG) gene sets as well. The huge number of hits by i-GSEA4GWAS for the imputed dataset was probably an artifact due to the scaling step in the algorithm. The decrease in hits by GSA-SNP for the imputed dataset may be due to the fact that it relies on Z-statistics, which is sensitive to variations in the background level of associations. Judicious evaluation of the GSA outcomes, perhaps based on multiple programs, is recommended.

Survival Prognostic Factors of Male Breast Cancer in Southern Iran: a LASSO-Cox Regression Approach

  • Shahraki, Hadi Raeisi;Salehi, Alireza;Zare, Najaf
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.16 no.15
    • /
    • pp.6773-6777
    • /
    • 2015
  • We used to LASSO-Cox method for determining prognostic factors of male breast cancer survival and showed the superiority of this method compared to Cox proportional hazard model in low sample size setting. In order to identify and estimate exactly the relative hazard of the most important factors effective for the survival duration of male breast cancer, the LASSO-Cox method has been used. Our data includes the information of male breast cancer patients in Fars province, south of Iran, from 1989 to 2008. Cox proportional hazard and LASSO-Cox models were fitted for 20 classified variables. To reduce the impact of missing data, the multiple imputation method was used 20 times through the Markov chain Mont Carlo method and the results were combined with Rubin's rules. In 50 patients, the age at diagnosis was 59.6 (SD=12.8) years with a minimum of 34 and maximum of 84 years and the mean of survival time was 62 months. Three, 5 and 10 year survival were 92%, 77% and 26%, respectively. Using the LASSO-Cox method led to eliminating 8 low effect variables and also decreased the standard error by 2.5 to 7 times. The relative efficiency of LASSO-Cox method compared with the Cox proportional hazard method was calculated as 22.39. The19 years follow of male breast cancer patients show that the age, having a history of alcohol use, nipple discharge, laterality, histological grade and duration of symptoms were the most important variables that have played an effective role in the patient's survival. In such situations, estimating the coefficients by LASSO-Cox method will be more efficient than the Cox's proportional hazard method.

The Effect of Publicness on the Service Quality in Long-Term Care Facilities (공공성이 노인장기요양시설의 서비스 질에 미치는 효과)

  • Kwon, Hyunjung;Hong, Kyungzoon
    • Korean Journal of Social Welfare
    • /
    • v.67 no.3
    • /
    • pp.253-280
    • /
    • 2015
  • This study reconsiders the concept of publicness by raising a question about the problems which the recent marketization of social services in South Korea. The existing perspective on publicness, however, is insufficient to account for the entire Long-term care market because only public organizations have publicness. Accordingly, this study presents 'integrated publicness', particularly 'dimensional publicness' and 'normative publicness'. A disproportional stratified sampling procedure was used to consider ownership. A merged dataset combining surveys from 248 Long-term Care facilities and on-line resources was used and analyzed by multiple regression, negative binomial regression and multiple imputation analysis. The analysis results suggest as follows. First, ownership publicness appeared more effective in the overall. Second, the regulations on the government funding did not show effective, and the regulation on evaluation system showed the effect. Third, professionalization of normative publicness showed a negative effect on service structure and showed a positive effect on service process. Lastly, user of free services whose public accountability was identified to be effective on service structure and outcome. These findings suggest that not only existing ownership but also dimensional publicness and normative publicness showed an effect on service quality. In this respect, this is important as the performance produced by empirical models of integrated publicness, in this situation that the outcome of marketization is insignificant.

  • PDF

Comparing Survival Functions with Doubly Interval-Censored Data: An Application to Diabetes Surveyed by Korean Cancer Prevention Study (이중구간중도절단된 생존자료의 생존함수 비교를 위한 검정: 한국인 암 예방연구 중 당뇨병에의 응용)

  • Jee, Sun-Ha;Nam, Chung-Mo;Kim, Jin-Heum
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.3
    • /
    • pp.595-606
    • /
    • 2009
  • Two tests were introduced for comparing several survival functions with doubly interval-censored data and illustrated with data surveyed by Korean Cancer Prevention Study (Jee et al., 2005). The test which extended Kim et al. (2006)'s test to the doubly interval-censored data has an advantage over Sun (2006)'s test in terms of saving computation time because the proposed test only depends on the size of risk set, and also the proposed test is applicable to continuous failure time data as well as discrete failure time data unlike Sun's test. Comparing male with female groups on the incubation time of diabetes was highly different and the survival of female group was longer than that of male one. Regardless of gender, the difference in survival functions of four age groups was highly significant with p-value of less than 0.001. This trend was more remarkable for female group than for male one. Simulation results showed that the significance level of both tests was well controlled and the proposed test was better than Sun's test in terms of power.

Mediation analysis of dietary habits, nutrient intakes, daily life in the relationship between working hours of Korean shift workers and metabolic syndrome : the sixth (2013 ~ 2015) Korea National Health and Nutrition Examination Survey (교대근무자의 근무시간과 대사증후군의 관계에서 식습관, 영양섭취상태, 일상생활의 매개효과 분석 : 6기 국민건강영양조사 (2013 ~ 2015) 데이터 이용)

  • Kim, Yoona;Kim, Hyeon Hee;Lim, Dong Hoon
    • Journal of Nutrition and Health
    • /
    • v.51 no.6
    • /
    • pp.567-579
    • /
    • 2018
  • Purpose: This study examined the mediation effects of dietary habits, nutrient intake, daily life in the relationship between the working hours of Korean shift workers and metabolic syndrome. Methods: Data were collected from the sixth (2013-2015) Korea National Health and Nutrition Examination Survey (KNHANES). The stochastic regression imputation was used to fill missing data. Statistical analysis was performed in Korean shift workers with metabolic syndrome using the SPSS 24 program for Windows and a structural equation model (SEM) using an analysis of moment structure (AMOS) 21.0 package. Results: The model fitted the data well in terms of the goodness of fit index (GFI) = 0.939, root mean square error of approximation (RMSEA) = 0.025, normed fit index (NFI) = 0.917, Tucker-Lewis index (TLI) = 0.984, comparative fit index (CFI) = 0.987, and adjusted goodness of fit index (AGFI) = 0.915. Specific mediation effect of dietary habits (p = 0.023) was statistically significant in the impact of the working hours of shift workers on nutrient intake, and specific mediation effect of daily life (p = 0.019) was statistically significant in the impact of the working hours of shift workers on metabolic syndrome. On the other hand, the dietary habits, nutrient intake and daily life had no significant multiple mediator effects on the working hours of shift workers with metabolic syndrome. Conclusion: The appropriate model suggests that working hours have direct effect on the daily life, which has the mediation effect on the risk of metabolic syndrome in shift workers.