• 제목/요약/키워드: health performance tree

검색결과 62건 처리시간 0.023초

Exploring Machine Learning Classifiers for Breast Cancer Classification

  • Inayatul Haq;Tehseen Mazhar;Hinna Hafeez;Najib Ullah;Fatma Mallek;Habib Hamam
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제18권4호
    • /
    • pp.860-880
    • /
    • 2024
  • Breast cancer is a major health concern affecting women and men globally. Early detection and accurate classification of breast cancer are vital for effective treatment and survival of patients. This study addresses the challenge of accurately classifying breast tumors using machine learning classifiers such as MLP, AdaBoostM1, logit Boost, Bayes Net, and the J48 decision tree. The research uses a dataset available publicly on GitHub to assess the classifiers' performance and differentiate between the occurrence and non-occurrence of breast cancer. The study compares the 10-fold and 5-fold cross-validation effectiveness, showing that 10-fold cross-validation provides superior results. Also, it examines the impact of varying split percentages, with a 66% split yielding the best performance. This shows the importance of selecting appropriate validation techniques for machine learning-based breast tumor classification. The results also indicate that the J48 decision tree method is the most accurate classifier, providing valuable insights for developing predictive models for cancer diagnosis and advancing computational medical research.

데이터 마이닝을 이용한 당뇨환자의 관리요인에 관한 연구 (A Study on Factors of Management of Diabetes Mellitus using Data Mining)

  • 김유미;장동민;김성수;박일수;강성홍
    • 한국산학기술학회논문지
    • /
    • 제10권5호
    • /
    • pp.1100-1108
    • /
    • 2009
  • 본 연구의 목적은 당뇨환자 관리와 관련된 요인을 규명하는데 있다. 2005년 국민건강 영양조사에 참여한 20세 이상의 성인 당뇨환자를 대상으로 하였다. 데이터마이닝 기법을 이용하여 로지스틱 회귀모형, 의사결정나무, 신경망 모형으로 당뇨환자관리모형을 개발한 결과 의사결정나무가 가장 설명력이 뛰어났다. 당뇨인지율과 관련된 요인으로는 연령, 거주지 및 직업이었고 중 연령이 가장 중요한 요인으로 나타났다. 당뇨치료율과 관련된 요인으로는 당뇨인지여부, 거주지 및 직업이었고 그 중 당뇨인지여부가 가장 중요한 변수로 나타났다. 당뇨환자의 관리프로그램은 당뇨환자의 특성별 군집으로 분류하고 그에 따라 관리해야 한다.

건강보험 청구 데이터를 활용한 머신러닝 기반유방암 환자의 생존 여부 예측 (The Prediction of Survival of Breast Cancer Patients Based on Machine Learning Using Health Insurance Claim Data)

  • 이덕규;변경근;이형동;신선희
    • 한국산업정보학회논문지
    • /
    • 제28권2호
    • /
    • pp.1-9
    • /
    • 2023
  • 유방암 관련 기존 AI 연구는 보조적인 진단 예측이나 임상적 요인에 따른 진료 결과를 예측하는 주제가 많았다. 또한 연구기관의 코호트 자료나 일부 환자 자료를 이용하는 경우가 대부분이었다. 본 논문에서는 건강보험심사평가원이 보유하고 있는 전 국민 유방암 환자의 전수 데이터를 활용하여 유방암 환자의 40~50대와 다른 연령대 간의 생존 여부 예측과 생존 여부에 미치는 요인의 차이점을 분석했다. 그 결과, 환자들의 생존 여부 예측 정밀도는 40~50대가 평균 0.93으로 60~80대 0.86 보다 높았으며, 요인에 있어서도 40~50대는 치료횟수(46%)가, 60~80대는 나이(32%)의 변수 중요도가 제일 높았다. 기존 연구와 성능 비교 결과, 평균 정밀도가 0.90으로 기존 논문의 정밀도 0.81보다 높았다. 적용 알고리즘별 성능 비교 결과, 의사결정나무(Decision Tree), 랜덤포레스트(Random Forest) 및 그래디언트부스팅(Gradient Boosting)의 전체 평균 정밀도는 0.90, 재현율은 1.0으로 연령대 그룹 내에서 동일하였으며, 다층퍼셉트론(Multi-Layer Perceptron)의 정밀도는 0.89, 재현율은 1.0 이었다. 심평원의 전 국민 심사청구 빅데이터 가치 활용을 제고하기 위해 비전문가용 머신러닝 자동화(Auto ML) 도구를 사용한 더 많은 연구가 진행되기를 바란다.

의사결정나무기법을 활용한 장기요양 복지용구 권고모형 개발 (A recommendation system for assisting devices in long-term care insurance)

  • 한은정;박상희;이정석;김동건
    • 응용통계연구
    • /
    • 제31권6호
    • /
    • pp.693-706
    • /
    • 2018
  • 노인의 신체기능에 부합하는 복지용구를 제공하는 것은 노인이 가능한 한 오랫동안 자신의 집과 지역사회에서 자립하여 생활할 수 있도록 돕기 위해 매우 중요하다. 본 연구는 수급자의 신체 및 인지 기능 상태를 고려하여 개개인에게 적합한 복지용구 품목을 권고할 수 있는 과학적인 복지용구 표준급여모형 알고리즘을 개발하고자 수행되었다. 모형개발에는 데이터마이닝기법인 의사결정나무를 활용하였다. 수급자 8,084명의 장기요양인정조사자료와 파워어세서가 작성한 표준급여계획, 수급자 특성 자료를 이용하여 데이터를 구축하였고, 15개 복지용구 품목별로 표준급여모형을 개발하였다. 본 연구는 노인장기요양보험의 복지용구 급여계획의 객관성 및 과학성을 확보하고 수급자의 자립생활과 안전을 향상시키는 데에 기여할 것으로 기대된다.

Schedule communication routing approach to maximize energy efficiency in wireless body sensor networks

  • Kaebeh, Yaeghoobi S.B.;Soni, M.K.;Tyagi, S.S.
    • Smart Structures and Systems
    • /
    • 제21권2호
    • /
    • pp.225-234
    • /
    • 2018
  • E-Health allows you to supersede the central patient wireless healthcare system. Wireless Body Sensor Network (WBSN) is the first phase of the e-Health system. In this paper, we aim to understand e-Health architecture and configuration, and attempt to minimize energy consumption and latency in transmission routing protocols during restrictive latency in data delivery of WBSN phase. The goal is to concentrate on polling protocol to improve and optimize the routing time interval and schedule communication to reduce energy utilization. In this research, two types of network models routing protocols are proposed - elemental and clustering. The elemental model improves efficiency by using a polling protocol, and the clustering model is the extension of the elemental model that Destruct Supervised Decision Tree (DSDT) algorithm has been proposed to solve the time interval conflict transmission. The simulation study verifies that the proposed models deliver better performance than the existing BSN protocol for WBSN.

수엽류 새순의 항염증 활성 및 페놀산 분석 (In vitro Anti-inflammatory Activities and Phenolic Acid Analysis of Tree Sprout Extracts)

  • 김주리;퀸누구엔;신한나;강기성;이상현
    • 생약학회지
    • /
    • 제52권4호
    • /
    • pp.257-266
    • /
    • 2021
  • This study evaluated several in vitro activities including the preliminary assessment of the anti-cancer, anti-inflammatory, and anti-diabetic effects of tree sprout extracts. Chlorogenic, caffeic, and p-coumaric acid contents in tree sprouts were analyzed using high-performance liquid chromatography and an ultraviolet detector. Among the studied tree sprout extracts, the ethanol (EtOH) extract of Rhus verniciflua exhibited the most potent anti-cancer effect by suppressing the cell viability of a human gastric adenocarcinoma cell line, with an IC50 of 7.06 ㎍/mL. The EtOH extract of Morus alba (MAB) inhibited the secretion of nitric oxide (NO) at a concentration of 100 ㎍/mL, with an IC50 of 83.44 ㎍/mL. Moreover, the EtOH extract of Securinega suffruticosa inhibited NO secretion with the lowest IC50 of 54.42 ㎍/mL. The EtOH extract of Fraxinus mandschurica was the only extract with effective α-glucosidase inhibitory activity. The total content of chlorogenic, caffeic, and p-coumaric acids was the highest in MAB (14.63 mg/g ext.). In conclusion, the beneficial activities of the tree sprout extracts with high phenolic acid content were generally high. Our results provide a theoretical basis for the development of health-promoting supplements and functional foods.

A Combinatorial Optimization for Influential Factor Analysis: a Case Study of Political Preference in Korea

  • Yun, Sung Bum;Yoon, Sanghyun;Heo, Joon
    • 한국측량학회지
    • /
    • 제35권5호
    • /
    • pp.415-422
    • /
    • 2017
  • Finding influential factors from given clustering result is a typical data science problem. Genetic Algorithm based method is proposed to derive influential factors and its performance is compared with two conventional methods, Classification and Regression Tree (CART) and Chi-Squared Automatic Interaction Detection (CHAID), by using Dunn's index measure. To extract the influential factors of preference towards political parties in South Korea, the vote result of $18^{th}$ presidential election and 'Demographic', 'Health and Welfare', 'Economic' and 'Business' related data were used. Based on the analysis, reverse engineering was implemented. Implementation of reverse engineering based approach for influential factor analysis can provide new set of influential variables which can present new insight towards the data mining field.

Data Mining for Knowledge Management in a Health Insurance Domain

  • Chae, Young-Moon;Ho, Seung-Hee;Cho, Kyoung-Won;Lee, Dong-Ha;Ji, Sun-Ha
    • 지능정보연구
    • /
    • 제6권1호
    • /
    • pp.73-82
    • /
    • 2000
  • This study examined the characteristicso f the knowledge discovery and data mining algorithms to demonstrate how they can be used to predict health outcomes and provide policy information for hypertension management using the Korea Medical Insurance Corporation database. Specifically this study validated the predictive power of data mining algorithms by comparing the performance of logistic regression and two decision tree algorithms CHAID (Chi-squared Automatic Interaction Detection) and C5.0 (a variant of C4.5) since logistic regression has assumed a major position in the healthcare field as a method for predicting or classifying health outcomes based on the specific characteristics of each individual case. This comparison was performed using the test set of 4,588 beneficiaries and the training set of 13,689 beneficiaries that were used to develop the models. On the contrary to the previous study CHAID algorithm performed better than logistic regression in predicting hypertension but C5.0 had the lowest predictive power. In addition CHAID algorithm and association rule also provided the segment characteristics for the risk factors that may be used in developing hypertension management programs. This showed that data mining approach can be a useful analytic tool for predicting and classifying health outcomes data.

  • PDF

머신러닝 알고리즘 기반의 의료비 예측 모델 개발 (Development of Medical Cost Prediction Model Based on the Machine Learning Algorithm)

  • Han Bi KIM;Dong Hoon HAN
    • Journal of Korea Artificial Intelligence Association
    • /
    • 제1권1호
    • /
    • pp.11-16
    • /
    • 2023
  • Accurate hospital case modeling and prediction are crucial for efficient healthcare. In this study, we demonstrate the implementation of regression analysis methods in machine learning systems utilizing mathematical statics and machine learning techniques. The developed machine learning model includes Bayesian linear, artificial neural network, decision tree, decision forest, and linear regression analysis models. Through the application of these algorithms, corresponding regression models were constructed and analyzed. The results suggest the potential of leveraging machine learning systems for medical research. The experiment aimed to create an Azure Machine Learning Studio tool for the speedy evaluation of multiple regression models. The tool faciliates the comparision of 5 types of regression models in a unified experiment and presents assessment results with performance metrics. Evaluation of regression machine learning models highlighted the advantages of boosted decision tree regression, and decision forest regression in hospital case prediction. These findings could lay the groundwork for the deliberate development of new directions in medical data processing and decision making. Furthermore, potential avenues for future research may include exploring methods such as clustering, classification, and anomaly detection in healthcare systems.

Prediction of Hypertension Complications Risk Using Classification Techniques

  • Lee, Wonji;Lee, Junghye;Lee, Hyeseon;Jun, Chi-Hyuck;Park, Il-Su;Kang, Sung-Hong
    • Industrial Engineering and Management Systems
    • /
    • 제13권4호
    • /
    • pp.449-453
    • /
    • 2014
  • Chronic diseases including hypertension and its complications are major sources causing the national medical expenditures to increase. We aim to predict the risk of hypertension complications for hypertension patients, using the sample national healthcare database established by Korean National Health Insurance Corporation. We apply classification techniques, such as logistic regression, linear discriminant analysis, and classification and regression tree to predict the hypertension complication onset event for each patient. The performance of these three methods is compared in terms of accuracy, sensitivity and specificity. The result shows that these methods seem to perform similarly although the logistic regression performs marginally better than the others.