Statistical Methods for Multivariate Missing Data in Health Survey Research

보건조사연구에서 다변량결측치가 내포된 자료를 효율적으로 분석하기 위한 통계학적 방법

  • Kim, Dong-Kee (Department of Biostatistics, Yonsei University College of Medicine) ;
  • Park, Eun-Cheol (Department of Preventive Medicine and Public Health, Yonsei University College of Medicine) ;
  • Sohn, Myong-Sei (Department of Preventive Medicine and Public Health, Yonsei University College of Medicine) ;
  • Kim, Han-Joong (Department of Preventive Medicine and Public Health, Yonsei University College of Medicine) ;
  • Park, Hyung-Uk (Department of Preventive Medicine and Public Health, Yonsei University College of Medicine) ;
  • Ahn, Chae-Hyung (Department of Biostatistics, Yonsei University College of Medicine) ;
  • Lim, Jong-Gun (Department of Biostatistics, Yonsei University College of Medicine) ;
  • Song, Ki-Jun (Department of Biostatistics, Yonsei University College of Medicine)
  • 김동기 (연세대학교 의과대학 의학통계학과) ;
  • 박은철 (연세대학교 의과대학 예방의학교실) ;
  • 손명세 (연세대학교 의과대학 예방의학교실) ;
  • 김한중 (연세대학교 의과대학 예방의학교실) ;
  • 박형욱 (연세대학교 의과대학 예방의학교실) ;
  • 안재형 (연세대학교 의과대학 의학통계학과) ;
  • 임종건 (연세대학교 의과대학 의학통계학과) ;
  • 송기준 (연세대학교 의과대학 의학통계학과)
  • Published : 1998.12.01

Abstract

Missing observations are common in medical research and health survey research. Several statistical methods to handle the missing data problem have been proposed. The EM algorithm (Expectation-Maximization algorithm) is one of the ways of efficiently handling the missing data problem based on sufficient statistics. In this paper, we developed statistical models and methods for survey data with multivariate missing observations. Especially, we adopted the EM algorithm to handle the multivariate missing observations. We assume that the multivariate observations follow a multivariate normal distribution, where the mean vector and the covariance matrix are primarily of interest. We applied the proposed statistical method to analyze data from a health survey. The data set we used came from a physician survey on Resource-Based Relative Value Scale(RBRVS). In addition to the EM algorithm, we applied the complete case analysis, which uses only completely observed cases, and the available case analysis, which utilizes all available information. The residual and normal probability plots were evaluated to access the assumption of normality. We found that the residual sum of squares from the EM algorithm was smaller than those of the complete-case and the available-case analyses.

Keywords