Trend of Utilization of Machine Learning Technology for Digital Healthcare Data Analysis

Woo, Y.C.;Lee, S.Y.;Choi, W.;Ahn, C.W.;Baek, O.K.;

doi:10.22648/ETRI.2019.J.340109

전자통신동향분석 (Electronics and Telecommunications Trends)

제34권1호
/
Pages.98-110
/
2019
/
1225-6455(pISSN)

한국전자통신연구원 (Electronics and Telecommunications Research Institute)

DOI QR Code

디지털 헬스케어 데이터 분석을 위한 머신 러닝 기술 활용 동향

Trend of Utilization of Machine Learning Technology for Digital Healthcare Data Analysis

우영춘 (IDX 원천기술연구실) ;
이성엽 (IDX 원천기술연구실) ;
최완 (IDX 원천기술연구실) ;
안창원 (IDX 원천기술연구실) ;
백옥기 (IDX 원천기술연구실)

발행 : 2019.02.01

https://doi.org/10.22648/ETRI.2019.J.340109 인용 PDF HTML

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

Machine learning has been applied to medical imaging and has shown an excellent recognition rate. Recently, there has been much interest in preventive medicine. If data are accessible, machine learning packages can be used easily in digital healthcare fields. However, it is necessary to prepare the data in advance, and model evaluation and tuning are required to construct a reliable model. On average, these processes take more than 80% of the total effort required. In this study, we describe the basic concepts of machine learning, pre-processing and visualization of datasets, feature engineering for reliable models, model evaluation and tuning, and the latest trends in popular machine learning frameworks. Finally, we survey a explainable machine learning analysis tool and will discuss the future direction of machine learning.

키워드

HJTOCM_2019_v34n1_98_f0001.png 이미지

(그림 1) 디지털 스마트 헬스케어 패러다임

HJTOCM_2019_v34n1_98_f0002.png 이미지

(그림 2) 머신 러닝 분석 흐름

HJTOCM_2019_v34n1_98_f0003.png 이미지

(그림 3) (a) Breast cancer 데이터 속성과 (b) 가시화

HJTOCM_2019_v34n1_98_f0004.png 이미지

(그림 4) (a)분류 행렬과 (b)유방암 데이터의 분석(XGBoost적용) 사례

HJTOCM_2019_v34n1_98_f0005.png 이미지

(그림 5) 유방암 데이터의 모델 평가 지표

HJTOCM_2019_v34n1_98_f0006.png 이미지

(그림 6) 유방암 데이터의 모델 평가 결과의 ROC 곡선

HJTOCM_2019_v34n1_98_f0007.png 이미지

(그림 7) Breast Cancer 데이터[12]의 XGBoost 적용 분석 및 피쳐 중요도 ([29]로 구성)

HJTOCM_2019_v34n1_98_f0008.png 이미지

(그림 8) 유방암 데이터의 XGBoost분석에 대하여SHAP에 의한 예측요인 설명([33]의 재구성)

<표 1> 데이터 가시화 도구

HJTOCM_2019_v34n1_98_t0001.png 이미지

참고문헌

서경원 외, "스마트 헬스케어 의료기기 기술," 표준전략보고서, 식품의약품안전평가원, 2018. 8.
송영준, "4차 산업혁명과 디지털 헬스케어 정책," 주간기술동향, 2018. 2.
정성원, "Healthcare에서 빅데이터의 활용," 제 5회 임상연구 방법론 워크숍, 가톨릭의대의생명산업연구원, 서울, 2016. 11. 5, pp. 18-29.
IBM, "Bigdata in Healthcare: Tapping New Insight to Save Lives," IBM Big Data & Analytics Hub, 2014. https://www.ibmbigdatahub.com/infographic/big-data-healthcare-tapping-new-insight-save-lives
Wikipedia, "Machine Learning," https://en.wikipedia.org/wiki/Machine_learning
정일영, 구원모, "헬스케어생태계 구축을위한 데이터통합 방안," 동향과 이슈, 제46호, 2018. 1, pp. 1-38.
MIT Critical Data, Secondary Analysis of Electronic Health Records, Springer International Publishing: NY, USA, 2016.
G. Press, "Cleaning Big Data: Most Time-Consuming, Least Enjoyable Data Science Task, Survey Says," Forbe, Mar. 23, 2016.
S. Christa, V. Suma, and L. Maduri, "An Effective Data Preprocessing Technique for Improved Data Management in a Distributed Environment," ACCTHPCA, vol. 3, July 2012, pp. 25-29.
SAS, "Data Visualization Techniques: From Basics to Big Data with SAS(R) Visual Analytics," SAS White Paper, 2018
P. van der Laken, "Facet," Google, June 2017. https://github.com/PAIR-code/facets
WIlliam H. Wolberg (physician), University of Wisconsin Hospitals. Madison, Wisconsin, USA, Breast Cancer Wisconsin (Original) Data Set, https://archive.ics.uci.edu/ml/datasets/breast+cancer+wisconsin+(original)
Tutorials Point, "Seaborn," TutorialsPoint, 2017. https://www.tutorialspoint.com/seaborn/seaborn_tutorial.pdf
A. Bilogur, "Missingno: A Missing Data Visualization Suite," J. Open Source Softw., Feb. 27, 2018, doi: 10.21105/joss.00547
Continuum Analytics, "Blaze Documentation," 2018. https://blaze.readthedocs.io/en/latest/index.html
G. Csardi and T. Nepusz, igraph Reference Manual, Harvard University: Cambridge, MA, USA, 2013.
Wikipedia, "Feature Engineering," https://en.wikipedia.org/wiki/Feature_engineering
A. Zheng, Evaluating Machine Learning Models, O'reilly: Sebastopol, CA, USA, 2015.
Medcalc, "ROC Curve Analysis," https://www.medcalc.org/manual/roc-curves.php
F.Y. Osisanwo et al., "Supervised Machine Learning Algorithms: Classification and Comparison," Int. J. Comput. Trends Technol., vol. 48, no. 3, June 2017, pp. 128-138. https://doi.org/10.14445/22312803/IJCTT-V48P126
P. Harrington, Machine Learning in Action, Manning Publications Co.: Shelter Island, NY, USA, 2012, pp. 83-100.
M. Namratha and T.R. Prajwala, "A Comprehensive Overview of Clustering Algorithms in Pattern Recognition," IOSR J. Comput. Eng., vol. 4, no. 6, 2012, pp. 23-30. https://doi.org/10.9790/0661-0462330
L. Arnold et al., "An Introduction to Deep Learning," in Proc. Eur. Symp. Artif. Neural Netw., Bruges, Belgium, Apr. 27-29, 2011, pp. 477-488.
Wikipedia, "Random Forest," https://en.wikipedia.org/wiki/Random_forest
Wikipedia, "Boosting," https://en.wikipedia.org/wiki/Boosting_(machine_learning)
R.E. Schapire, "The Boosting Approach to Machine Learning, An Overview," in MSRI Workshop on Nonlinear Estimation and Classification, Springer: Heidelberg, Germany, 2002, pp. 3-4.
A. Natekin and A. Knoll, "Gradient Boosting Machines, a Tutorial," Front. Neurorobot., July 21, 2013, doi: 10.3389/fnbot.2013.00021.
G. Biau, B. Cadre, and L. Rouviere, "Accelerated Gradient Boosting," arXiv:1803.02042, May 2018.
J. Brownlee, "XGBoost with Python, Gradien Boosted Trees with XGBoost and Scikit-learn," Machine Learning Mastery, Sept. 19, 2016.
G. Ke et al., "LGBM LightGBM: A Highly Efficient Gradient Boosting Decision Tree," Conf. Neural Inform. Process. Syst., Long Beach, CA, USA, 2017, pp. 1-9.
A. Veronika, D.V. Ershov, and A. Guli, "CatBoost: Gradient Boosting with Categorical Features Support," Yandex, 2017. https://catboost.ai/
M. Du, N. Liu, and X. Hu, "Techniques for Interpretable Machine Learning," arXiv:1808.00033, July 2018.
M.T. Ribeiro, S. Singh, and C. Guestrin, "Why Should I Trust You?" Proc. ACM SIGKDD Int. Conf. Knowled. Discovery Data Mining, San Francisco, CA, USA, Aug. 13-17, 2016, pp. 1135-1144.
S.M. Lundberg and S.-I. Lee, "A Unified Approach to Interpreting Model Predictions," Conf. Neural Inform. Process. Syst., Long Beach, CA, USA, 2017, pp. 1-10.
A. Saabas, "treeinterpreter, 2015. https://github.com/andosa/treeinterpreter
D. Foster, "xgboostExplainer," 2017. https://github.com/AppliedDataSciencePartners/xgboostExplainer

전자통신동향분석 (Electronics and Telecommunications Trends)

디지털 헬스케어 데이터 분석을 위한 머신 러닝 기술 활용 동향

Trend of Utilization of Machine Learning Technology for Digital Healthcare Data Analysis

초록

키워드

참고문헌

자세히 찾기