DOI QR코드

DOI QR Code

선형혼합모형을 이용한 유전체 자료분석방안에 대한 연구

Efficient strategy for the genetic analysis of related samples with a linear mixed model

  • Lim, Jeongmin (Chunlab, Inc.) ;
  • Sung, Joohon (Department of Public Health Science, Seoul National University) ;
  • Won, Sungho (Department of Public Health Science, Seoul National University)
  • 투고 : 2014.06.30
  • 심사 : 2014.08.05
  • 발행 : 2014.09.30

초록

가족 자료를 활용한 연속형 표현형의 전장유전체분석 (genome-wide association analysis)은 주로 선형혼합모형을 이용하며, 분산공분산행렬은 가족 구성원간의 유전적 거리를 고려하여 결정된다. 그러나 가족 구성원들의 표현형의 유사성은 유전적 요인과 환경적 요인에 의하여 발생함에도 불구하고, 표현형의 유사성은 단지 유전적 요인에 의해서 발생한다고 가정한다. 예를 들어 키의 경우 부부 사이에 양의 상관관계가 존재하나 유전적 요인만 고려하여 독립으로 가정한다. 선형혼합 모형에서 분산공분산 구조를 잘못 가정하는 경우, 검정통계량의 1종 혹은 2종의 오류를 적절히 관리할 수 없다. 본 논문에서는 다양한 유형의 분산공분산구조를 가정할 수 있는 선형혼합모형과 이를 기반으로 한 검정통계량을 제안하였다. 모의실험을 통하여 제안한 방법이 기존의 모형보다 통계적 검정력이 우수함을 확인하였다. 또한 체질량지수 (body mass index; BMI)의 전장유전체 분석에 적용하여 기존에 알려지지 않은 새로운 원인 유전자를 규명하였다.

Linear mixed model has often been utilized for genetic association analysis with family-based samples. The correlation matrix for family-based samples is constructed with kinship coefficient and assumes that parental phenotypes are independent and the amount of correlations between parent and offspring is same as that of correlations between siblings. However, for instance, there are positive correlations between parental heights, which indicates that the assumption for correlation matrix is often violated. The statistical validity and power are affected by the appropriateness of assumed variance covariance matrix, and in this thesis, we provide the linear mixed model with flexible variance covariance matrix. Our results show that the proposed method is usually more efficient than existing approaches, and its application to genome-wide association study of body mass index illustrates the practical value in real data analysis.

키워드

참고문헌

  1. Corbeil, R. R. and Searle, S. R. (1976). Restricted maximum likelihood (REML) estimation of variance components in the mixed model. Technometrics, 18, 31-38. https://doi.org/10.2307/1267913
  2. Diggle, P., Heagerty, P., Liang, K. Y. and Zeger, S. (2002). Analysis of longitudinal data, 2nd Ed., Oxford University Press, USA.
  3. Dunn, O. J. (1961). Multiple comparisons among means. Journal of the American Statistical Association, 56, 52-64. https://doi.org/10.1080/01621459.1961.10482090
  4. Falconer, D. S., Mackay, T. F. and Frankham, R. (1996). Introduction to quantitative genetics (4th edn). Trends in Genetics, 12, 280. https://doi.org/10.1016/0168-9525(96)81458-2
  5. Gilmour, A. R., Thompson, R. and Cullis, B. R. (1995). Average information REML: An efficient algorithm for variance parameter estimation in linear mixed models. Biometrics, 51, 1440-1450. https://doi.org/10.2307/2533274
  6. Jennrich, R. I., and Sampson, P. F. (1976). Newton-Raphson and related algorithms for maximum likelihood variance component estimation. Technometrics, 18, 11-17. https://doi.org/10.2307/1267911
  7. Kang, H. M., Sul, J. H., Service, S. K., Zaitlen, N. A., Kong, S. Y., Freimer, N. B., Sabatti, C. and Eskin, E. (2010). Variance component model to account for sample structure in genome-wide association studies. Nature Genetics, 42, 348-354. https://doi.org/10.1038/ng.548
  8. Kenward, M. G. and Roger, J. H. (1997). Small sample inference for fixed effects from restricted maximum likelihood. Biometrics, 53, 983-997. https://doi.org/10.2307/2533558
  9. Laird, N. M. andWare, J. H. (1982). Random-effects models for longitudinal data. Biometrics, 38, 963-974. https://doi.org/10.2307/2529876
  10. Lee, J. (2010). Genetic variation and diseases, 2nd Ed., World Science, Korea.
  11. Lindstrom, M. J. and Bates, D. M. (1988). Newton-Raphson and EM algorithms for linear mixed-effects models for repeated-measures data. Journal of the American Statistical Association, 83, 1014-1022.
  12. Manolio, T. A., Collins, F. S., Cox, N. J., Goldstein, D. B., Hindorff, L. A., Hunter, D. J., McCarthy, M. I., Ramos, E. M., et al. (2009). Finding the missing heritability of complex diseases. Nature, 461, 747-753. https://doi.org/10.1038/nature08494
  13. Neudecker, H. and Magnus, J. R. (1999). Matrix differential calculus with applications in statistics and econometrics, 2nd Ed., Wiley, New York.
  14. Patterson, H. D. and Thompson, R. (1971). Recovery of inter-block information when block sizes are unequal. Biometrika, 58, 545. https://doi.org/10.1093/biomet/58.3.545
  15. Smyth, G. K. and Verbyla, A. P. (1996). A conditional likelihood approach to residual maximum likelihood estimation in generalized linear models. Journal of the Royal Statistical Society B, 58, 572.
  16. Stoline, M. R. (1981). The status of multiple comparisons: Simultaneous estimation of all pairwise comparisons in one-way ANOVA designs. The American Statistician, 35, 134-141.
  17. Sung, J., Cho, S. I., Lee, K., Lee, M., Ha, M., Choi, E. Y., Choi, J. S., Kim, H. K., et al. (2006), Healthy twin: A twin-family study of Korea-protocols and current status. Twin Research and Human Genetics, 9, 844-848. https://doi.org/10.1375/twin.9.6.844
  18. Valdar, W., Solberg, L. C., Gauguier, D., Burnett, S., Klenerman, P., Cookson, W. O., Taylor, M. S., Rawlins, J. N. P., Mott, R. and Flint, H. (2006). Genome-wide genetic association of complex traits in heterogeneous stock mice. Nature Genetics, 38, 879-887. https://doi.org/10.1038/ng1840
  19. Zhou, X. and Stephens, M. (2012). Genome-wide efficient mixed-model analysis for association studies. Nature Genetics, 44, 821-824. https://doi.org/10.1038/ng.2310

피인용 문헌

  1. The wage determinants of the vocational high school graduates using mixed effects mode vol.27, pp.4, 2016, https://doi.org/10.7465/jkdi.2016.27.4.935
  2. Linear Mixed Models in Genetic Epidemiological Studies and Applications vol.28, pp.2, 2015, https://doi.org/10.5351/KJAS.2015.28.2.295
  3. On the Estimation of Heritability with Family-Based and Population-Based Samples vol.2015, 2015, https://doi.org/10.1155/2015/671349
  4. 반복 측정 자료를 이용한 장애인 우울에 대한 분석 vol.28, pp.5, 2014, https://doi.org/10.7465/jkdi.2017.28.5.1055