무응답이 있는 설문조사연구의 접근법 : 한국노인약물역학코호트 자료의 평가

An Approach to Survey Data with Nonresponse: Evaluation of KEPEC Data with BMI

  • 백지은 (서울대학교 자연과학대학 통계학과) ;
  • 강위창 (대전대학교 정보통계학과) ;
  • 이영조 (서울대학교 자연과학대학 통계학과) ;
  • 박병주 (서울대학교 의과대학 예방의학교실)
  • Baek, Ji-Eun (Department of Statistics, Seoul National University College of Natural Science) ;
  • Kang, Wee-Chang (Department of Informetrics and Statistics, Daejun University, College of Natural Science) ;
  • Lee, Young-Jo (Department of Statistics, Seoul National University College of Natural Science) ;
  • Park, Byung-Joo (Department of Preventive Medicine, Seoul National University College of Medicine)
  • 발행 : 2002.06.01

초록

Objectives : A common problem with analyzing survey data involves incomplete data with either a nonresponse or missing data. The mail questionnaire survey conducted for collecting lifestyle variables on the members of the Korean Elderly Phamacoepidemiologic Cohort(KEPEC) in 1996 contains some nonresponse or missing data. The proper statistical method was applied to evaluate the missing pattern of a specific KEPEC data, which had no missing data in the independent variable and missing data in the response variable, BMI. Methods : The number of study subjects was 8,689 elderly people. Initially, the BMI and significant variables that influenced the BMI were categorized. After fitting the log-linear model, the probabilities of the people on each category were estimated. The EM algorithm was implemented using a log-linear model to determine the missing mechanism causing the nonresponse. Results : Age, smoking status, and a preference of spicy hot food were chosen as variables that influenced the BMI. As a result of fitting the nonignorable and ignorable nonresponse log-linear model considering these variables, the difference in the deviance in these two models was 0.0034(df=1). Conclusion : There is a lot of risk if an inference regarding the variables and large samples is made without considering the pattern of missing data. On the basis of these results, the missing data occurring in the BMI is the ignorable nonresponse. Therefore, when analyzing the BMI in KEPEC data, the inference can be made about the data without considering the missing data.

키워드

참고문헌

  1. Park BJ, Kim DS, Koo HW, Bae JM. Reliability and Validity of a Life Style Questionnaire for Elderly People. Korean J Prev Med 1998; 31(1): 49-58 (Korean)
  2. Little RJA, Rubin DB. Statistical Analysis with Missing Data, New York: John Wiley & Son; 1987
  3. Pregibon D. Typical Survey Data: Estimation and Imputation. Survey Methodology 1977; 2: 70-102
  4. Little RJA. Models for Nonresponse in Sample Survey. JASA 1982; 77: 237-250 https://doi.org/10.2307/2287227
  5. Park TS, Brown MB. Models for Categorical Data with Nonignorable Nonresponse. JASA 1994; 89: 44-52 https://doi.org/10.2307/2291199
  6. Fey RE. Causal Models for Patterns of Nonresponse. JASA 1986; 81:354-365 https://doi.org/10.2307/2289224
  7. Baker SG, Laird NM. Regression Analysis for Categorical Variables with Outcome Subject to Nonignorable Nonresponse. JASA 1988; 83: 62-69 https://doi.org/10.2307/2288919
  8. Chambers RL, Welsh AH. Log-linear Models for Survey Data with Nonignorable Non-response. JRSS B 1993; 55(1): 157-170
  9. Dempster AP, Laird NM, Rubin DB. Maximum Likelihood Estimation from Incomplete Data via the EM Algorithm. JRSS B 1977; 39: 1-38
  10. Bishop YM, Feinberg SE, Holland PW. Discrete Multivariate Analysis, Cambridge, MA : MIT Press; 1975
  11. Park TS. An Approach to Categorical Data with Nonignorable Nonresponse. Biometrics 1998; 54: 1579-1590 https://doi.org/10.2307/2533682
  12. Park TS, Lee SY. Analysis of Categorical Data with Nonresponses. Korean J Appl Statistics 1998; 11: 83-95 (Korean)