통합 검색 | Korea Science

Exploring Machine Learning Classifiers for Breast Cancer Classification

Inayatul Haq;Tehseen Mazhar;Hinna Hafeez;Najib Ullah;Fatma Mallek;Habib Hamam
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- 제18권4호
- /
- pp.860-880
- /
- 2024
Breast cancer is a major health concern affecting women and men globally. Early detection and accurate classification of breast cancer are vital for effective treatment and survival of patients. This study addresses the challenge of accurately classifying breast tumors using machine learning classifiers such as MLP, AdaBoostM1, logit Boost, Bayes Net, and the J48 decision tree. The research uses a dataset available publicly on GitHub to assess the classifiers' performance and differentiate between the occurrence and non-occurrence of breast cancer. The study compares the 10-fold and 5-fold cross-validation effectiveness, showing that 10-fold cross-validation provides superior results. Also, it examines the impact of varying split percentages, with a 66% split yielding the best performance. This shows the importance of selecting appropriate validation techniques for machine learning-based breast tumor classification. The results also indicate that the J48 decision tree method is the most accurate classifier, providing valuable insights for developing predictive models for cancer diagnosis and advancing computational medical research.
https://doi.org/10.3837/tiis.2024.04.003 인용 PDF HTML

데이터 마이닝을 이용한 당뇨환자의 관리요인에 관한 연구 (A Study on Factors of Management of Diabetes Mellitus using Data Mining)

김유미;장동민;김성수;박일수;강성홍
- 한국산학기술학회논문지
- /
- 제10권5호
- /
- pp.1100-1108
- /
- 2009
본 연구의 목적은 당뇨환자 관리와 관련된 요인을 규명하는데 있다. 2005년 국민건강 영양조사에 참여한 20세 이상의 성인 당뇨환자를 대상으로 하였다. 데이터마이닝 기법을 이용하여 로지스틱 회귀모형, 의사결정나무, 신경망 모형으로 당뇨환자관리모형을 개발한 결과 의사결정나무가 가장 설명력이 뛰어났다. 당뇨인지율과 관련된 요인으로는 연령, 거주지 및 직업이었고 중 연령이 가장 중요한 요인으로 나타났다. 당뇨치료율과 관련된 요인으로는 당뇨인지여부, 거주지 및 직업이었고 그 중 당뇨인지여부가 가장 중요한 변수로 나타났다. 당뇨환자의 관리프로그램은 당뇨환자의 특성별 군집으로 분류하고 그에 따라 관리해야 한다.
https://doi.org/10.5762/KAIS.2009.10.5.1100 인용 PDF

건강보험 청구 데이터를 활용한 머신러닝 기반유방암 환자의 생존 여부 예측 (The Prediction of Survival of Breast Cancer Patients Based on Machine Learning Using Health Insurance Claim Data)

이덕규;변경근;이형동;신선희
- 한국산업정보학회논문지
- /
- 제28권2호
- /
- pp.1-9
- /
- 2023
유방암 관련 기존 AI 연구는 보조적인 진단 예측이나 임상적 요인에 따른 진료 결과를 예측하는 주제가 많았다. 또한 연구기관의 코호트 자료나 일부 환자 자료를 이용하는 경우가 대부분이었다. 본 논문에서는 건강보험심사평가원이 보유하고 있는 전 국민 유방암 환자의 전수 데이터를 활용하여 유방암 환자의 40~50대와 다른 연령대 간의 생존 여부 예측과 생존 여부에 미치는 요인의 차이점을 분석했다. 그 결과, 환자들의 생존 여부 예측 정밀도는 40~50대가 평균 0.93으로 60~80대 0.86 보다 높았으며, 요인에 있어서도 40~50대는 치료횟수(46%)가, 60~80대는 나이(32%)의 변수 중요도가 제일 높았다. 기존 연구와 성능 비교 결과, 평균 정밀도가 0.90으로 기존 논문의 정밀도 0.81보다 높았다. 적용 알고리즘별 성능 비교 결과, 의사결정나무(Decision Tree), 랜덤포레스트(Random Forest) 및 그래디언트부스팅(Gradient Boosting)의 전체 평균 정밀도는 0.90, 재현율은 1.0으로 연령대 그룹 내에서 동일하였으며, 다층퍼셉트론(Multi-Layer Perceptron)의 정밀도는 0.89, 재현율은 1.0 이었다. 심평원의 전 국민 심사청구 빅데이터 가치 활용을 제고하기 위해 비전문가용 머신러닝 자동화(Auto ML) 도구를 사용한 더 많은 연구가 진행되기를 바란다.
https://doi.org/10.9723/jksiis.2023.28.2.001 인용 PDF

의사결정나무기법을 활용한 장기요양 복지용구 권고모형 개발 (A recommendation system for assisting devices in long-term care insurance)

한은정;박상희;이정석;김동건
- 응용통계연구
- /
- 제31권6호
- /
- pp.693-706
- /
- 2018
노인의 신체기능에 부합하는 복지용구를 제공하는 것은 노인이 가능한 한 오랫동안 자신의 집과 지역사회에서 자립하여 생활할 수 있도록 돕기 위해 매우 중요하다. 본 연구는 수급자의 신체 및 인지 기능 상태를 고려하여 개개인에게 적합한 복지용구 품목을 권고할 수 있는 과학적인 복지용구 표준급여모형 알고리즘을 개발하고자 수행되었다. 모형개발에는 데이터마이닝기법인 의사결정나무를 활용하였다. 수급자 8,084명의 장기요양인정조사자료와 파워어세서가 작성한 표준급여계획, 수급자 특성 자료를 이용하여 데이터를 구축하였고, 15개 복지용구 품목별로 표준급여모형을 개발하였다. 본 연구는 노인장기요양보험의 복지용구 급여계획의 객관성 및 과학성을 확보하고 수급자의 자립생활과 안전을 향상시키는 데에 기여할 것으로 기대된다.
https://doi.org/10.5351/KJAS.2018.31.6.693 인용 PDF KSCI HTML

Schedule communication routing approach to maximize energy efficiency in wireless body sensor networks

Kaebeh, Yaeghoobi S.B.;Soni, M.K.;Tyagi, S.S.
- Smart Structures and Systems
- /
- 제21권2호
- /
- pp.225-234
- /
- 2018
E-Health allows you to supersede the central patient wireless healthcare system. Wireless Body Sensor Network (WBSN) is the first phase of the e-Health system. In this paper, we aim to understand e-Health architecture and configuration, and attempt to minimize energy consumption and latency in transmission routing protocols during restrictive latency in data delivery of WBSN phase. The goal is to concentrate on polling protocol to improve and optimize the routing time interval and schedule communication to reduce energy utilization. In this research, two types of network models routing protocols are proposed - elemental and clustering. The elemental model improves efficiency by using a polling protocol, and the clustering model is the extension of the elemental model that Destruct Supervised Decision Tree (DSDT) algorithm has been proposed to solve the time interval conflict transmission. The simulation study verifies that the proposed models deliver better performance than the existing BSN protocol for WBSN.
https://doi.org/10.12989/sss.2018.21.2.225 인용 KSCI

수엽류 새순의 항염증 활성 및 페놀산 분석 (In vitro Anti-inflammatory Activities and Phenolic Acid Analysis of Tree Sprout Extracts)

김주리;퀸누구엔;신한나;강기성;이상현
- 생약학회지
- /
- 제52권4호
- /
- pp.257-266
- /
- 2021
This study evaluated several in vitro activities including the preliminary assessment of the anti-cancer, anti-inflammatory, and anti-diabetic effects of tree sprout extracts. Chlorogenic, caffeic, and p-coumaric acid contents in tree sprouts were analyzed using high-performance liquid chromatography and an ultraviolet detector. Among the studied tree sprout extracts, the ethanol (EtOH) extract of Rhus verniciflua exhibited the most potent anti-cancer effect by suppressing the cell viability of a human gastric adenocarcinoma cell line, with an IC₅₀ of 7.06 ㎍/mL. The EtOH extract of Morus alba (MAB) inhibited the secretion of nitric oxide (NO) at a concentration of 100 ㎍/mL, with an IC₅₀ of 83.44 ㎍/mL. Moreover, the EtOH extract of Securinega suffruticosa inhibited NO secretion with the lowest IC₅₀ of 54.42 ㎍/mL. The EtOH extract of Fraxinus mandschurica was the only extract with effective α-glucosidase inhibitory activity. The total content of chlorogenic, caffeic, and p-coumaric acids was the highest in MAB (14.63 mg/g ext.). In conclusion, the beneficial activities of the tree sprout extracts with high phenolic acid content were generally high. Our results provide a theoretical basis for the development of health-promoting supplements and functional foods.
https://doi.org/10.22889/KJP.2021.52.4.257 인용 PDF KSCI HTML

A Combinatorial Optimization for Influential Factor Analysis: a Case Study of Political Preference in Korea

Yun, Sung Bum;Yoon, Sanghyun;Heo, Joon
- 한국측량학회지
- /
- 제35권5호
- /
- pp.415-422
- /
- 2017
Finding influential factors from given clustering result is a typical data science problem. Genetic Algorithm based method is proposed to derive influential factors and its performance is compared with two conventional methods, Classification and Regression Tree (CART) and Chi-Squared Automatic Interaction Detection (CHAID), by using Dunn's index measure. To extract the influential factors of preference towards political parties in South Korea, the vote result of $18^{th}$ presidential election and 'Demographic', 'Health and Welfare', 'Economic' and 'Business' related data were used. Based on the analysis, reverse engineering was implemented. Implementation of reverse engineering based approach for influential factor analysis can provide new set of influential variables which can present new insight towards the data mining field.
https://doi.org/10.7848/ksgpc.2017.35.5.415 인용 PDF KSCI

Data Mining for Knowledge Management in a Health Insurance Domain

Chae, Young-Moon;Ho, Seung-Hee;Cho, Kyoung-Won;Lee, Dong-Ha;Ji, Sun-Ha
- 지능정보연구
- /
- 제6권1호
- /
- pp.73-82
- /
- 2000
This study examined the characteristicso f the knowledge discovery and data mining algorithms to demonstrate how they can be used to predict health outcomes and provide policy information for hypertension management using the Korea Medical Insurance Corporation database. Specifically this study validated the predictive power of data mining algorithms by comparing the performance of logistic regression and two decision tree algorithms CHAID (Chi-squared Automatic Interaction Detection) and C5.0 (a variant of C4.5) since logistic regression has assumed a major position in the healthcare field as a method for predicting or classifying health outcomes based on the specific characteristics of each individual case. This comparison was performed using the test set of 4,588 beneficiaries and the training set of 13,689 beneficiaries that were used to develop the models. On the contrary to the previous study CHAID algorithm performed better than logistic regression in predicting hypertension but C5.0 had the lowest predictive power. In addition CHAID algorithm and association rule also provided the segment characteristics for the risk factors that may be used in developing hypertension management programs. This showed that data mining approach can be a useful analytic tool for predicting and classifying health outcomes data.
PDF

머신러닝 알고리즘 기반의 의료비 예측 모델 개발 (Development of Medical Cost Prediction Model Based on the Machine Learning Algorithm)

Han Bi KIM;Dong Hoon HAN
- Journal of Korea Artificial Intelligence Association
- /
- 제1권1호
- /
- pp.11-16
- /
- 2023
Accurate hospital case modeling and prediction are crucial for efficient healthcare. In this study, we demonstrate the implementation of regression analysis methods in machine learning systems utilizing mathematical statics and machine learning techniques. The developed machine learning model includes Bayesian linear, artificial neural network, decision tree, decision forest, and linear regression analysis models. Through the application of these algorithms, corresponding regression models were constructed and analyzed. The results suggest the potential of leveraging machine learning systems for medical research. The experiment aimed to create an Azure Machine Learning Studio tool for the speedy evaluation of multiple regression models. The tool faciliates the comparision of 5 types of regression models in a unified experiment and presents assessment results with performance metrics. Evaluation of regression machine learning models highlighted the advantages of boosted decision tree regression, and decision forest regression in hospital case prediction. These findings could lay the groundwork for the deliberate development of new directions in medical data processing and decision making. Furthermore, potential avenues for future research may include exploring methods such as clustering, classification, and anomaly detection in healthcare systems.
https://doi.org/10.24225/jkaia.2023.1.1.11 인용 PDF

Prediction of Hypertension Complications Risk Using Classification Techniques

Lee, Wonji;Lee, Junghye;Lee, Hyeseon;Jun, Chi-Hyuck;Park, Il-Su;Kang, Sung-Hong
- Industrial Engineering and Management Systems
- /
- 제13권4호
- /
- pp.449-453
- /
- 2014
Chronic diseases including hypertension and its complications are major sources causing the national medical expenditures to increase. We aim to predict the risk of hypertension complications for hypertension patients, using the sample national healthcare database established by Korean National Health Insurance Corporation. We apply classification techniques, such as logistic regression, linear discriminant analysis, and classification and regression tree to predict the hypertension complication onset event for each patient. The performance of these three methods is compared in terms of accuracy, sensitivity and specificity. The result shows that these methods seem to perform similarly although the logistic regression performs marginally better than the others.
https://doi.org/10.7232/iems.2014.13.4.449 인용 PDF KSCI

검색결과 62건 처리시간 0.023초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)