DOI QR코드

DOI QR Code

Variational Bayesian multinomial probit model with Gaussian process classification on mice protein expression level data

가우시안 과정 분류에 대한 변분 베이지안 다항 프로빗 모형: 쥐 단백질 발현 데이터에의 적용

  • Donghyun Son (Department of Applied Statistics, Chung-Ang University) ;
  • Beom Seuk Hwang (Department of Applied Statistics, Chung-Ang University)
  • 손동현 (중앙대학교 응용통계학과) ;
  • 황범석 (중앙대학교 응용통계학과)
  • Received : 2022.11.23
  • Accepted : 2022.12.16
  • Published : 2023.04.30

Abstract

Multinomial probit model is a popular model for multiclass classification and choice model. Markov chain Monte Carlo (MCMC) method is widely used for estimating multinomial probit model, but its computational cost is high. However, it is well known that variational Bayesian approximation is more computationally efficient than MCMC, because it uses subsets of samples. In this study, we describe multinomial probit model with Gaussian process classification and how to employ variational Bayesian approximation on the model. This study also compares the results of variational Bayesian multinomial probit model to the results of naive Bayes, K-nearest neighbors and support vector machine for the UCI mice protein expression level data.

다항 프로빗 모형은 다중 분류와 선택 모형에서 흔히 사용하는 모형이다. 다항 프로빗 모형을 추정하기 위해 일반적으로 널리 사용하는 베이지안 접근법인 마르코프 연쇄 몬테카를로(MCMC) 방법은 계산 복잡도가 매우 높다는 문제점을 가지고 있다. 반면, 변분 베이즈 방법은 MCMC 방법보다 계산 복잡도는 낮으면서도 분류 성능적인 면에서 큰 차이가 나지 않아 더 효율적인 방법으로 알려져 있다. 본 연구에서는 가우시안 과정에 기반한 다항 프로빗 모형을 설명하고 해당 모형에 적용할 수 있는 변분 베이지안 근사법을 알아보고자 한다. 그리고 UCI에서 제공되는 쥐 단백질 발현 데이터에 가우시안 과정 분류에 대한 변분 베이지안 다항 프로빗 모형을 적용하여 그 성능을 확인하고 나이브 베이즈, K-최근접 이웃법, 서포트 벡터 머신 분류기의 성능과 비교한다.

Keywords

Acknowledgement

이 논문은 2021년도 중앙대학교 CAU GRS 지원에 의하여 작성되었고, 2019년도 정부 (과학기술정보통신부)의 재원으로 한국연구재단의 지원을 받아 수행된 기초연구사업임 (NRF-2019R1C1C1011710).

References

  1. Albert JH and Chib S (1993). Bayesian analysis of binary and polychotomous response data, Journal of the American Statistical Association 88, 669-679. https://doi.org/10.1080/01621459.1993.10476321
  2. Beal MJ (2003). Variational Algorithms for Approximate Bayesian Inference. University of London, University College London (United Kingdom).
  3. Blei DM, Kucukelbir A, and McAuliffe JD (2017). Variational inference: A review for statisticians, Journal of the American Statistical Association, 112, 859-877. https://doi.org/10.1080/01621459.2017.1285773
  4. Girolami M, and Rogers S (2006). Variational Bayesian multinomial probit regression with Gaussian process priors, Neural Computation, 18, 1790-1817. https://doi.org/10.1162/neco.2006.18.8.1790
  5. Hausman JA and Wise DA (1978). A conditional probit model for qualitative choice: Discrete decisions recognizing interdependence and heterogeneous preferences, Econometrica: Journal of the Econometric Society, 46, 403-426. https://doi.org/10.2307/1913909
  6. Higuera C, Gardiner KJ, and Cios KJ (2015). Self-Organizing feature maps identify proteins critical to learning in a mouse model of down syndrome, PloS One, 10, e0129126.
  7. Jordan MI, Ghahramani Z, Jaakkola TS, and Saul LK (1999). An introduction to variational methods for graphical models, Machine Learning, 37, 183-233. https://doi.org/10.1023/A:1007665907178
  8. Kote-Jarai Z, Matthews L, Osorio A et al. (2006). Accurate prediction of BRCA1 and BRCA2 heterozygous genotype using expression profiling after induced DNA damage, Clinical Cancer Research, 12, 3896-3901. https://doi.org/10.1158/1078-0432.CCR-05-2805
  9. Lama N and Girolami M (2008). Vbmp: Variational Bayesian multinomial probit regression for multi-class classification in R, Bioinformatics, 24, 135-136. https://doi.org/10.1093/bioinformatics/btm535
  10. Lama N and Girolami M (2022). vbmp: Variational Bayesian Multinomial Probit Regression . R package version 1.64.0, Available from: http://bioinformatics.oxfordjournals.org/cgi/content/short/btm535v1
  11. Lawrence ND, Milo M, Niranjan M, Rashbass P, and Soullier S (2004). Reducing the variability in cDNA microarray image processing by Bayesian inference, Bioinformatics, 20, 518-526. https://doi.org/10.1093/bioinformatics/btg438
  12. Minka TP (2001). A family of algorithms for approximate Bayesian inference (Doctoral dissertation), Massachusetts Institute of Technology, Cambridge, MA, USA.
  13. Neal RM (1998). Regression and classification using gaussian process priors. In AP Dawid, M Bernardo, JO Berger, and AFM Smith (Eds), Bayesian Statistics 6 (pp. 475-501), Oxford University Press, New York.
  14. R Core Team (2022). R: A language and environment for statistical computing, R Foundation for Statistical Computing, Vienna, Austria.
  15. Williams CK and Barber D (1998). Bayesian classification with Gaussian processes, IEEE Transactions on Pattern Analysis and Machine Intelligence, 20, 1342-1351. https://doi.org/10.1109/34.735807
  16. Williams CK and Rasmussen CE (2006). Gaussian Processes for Machine Learning, MIT press, Cambridge, MA.