• Title/Summary/Keyword: Matthews Correlation Coefficient

Search Result 7, Processing Time 0.019 seconds

Prediction model of hypercholesterolemia using body fat mass based on machine learning (머신러닝 기반 체지방 측정정보를 이용한 고콜레스테롤혈증 예측모델)

  • Lee, Bum Ju
    • The Journal of the Convergence on Culture Technology
    • /
    • v.5 no.4
    • /
    • pp.413-420
    • /
    • 2019
  • The purpose of the present study is to develop a model for predicting hypercholesterolemia using an integrated set of body fat mass variables based on machine learning techniques, beyond the study of the association between body fat mass and hypercholesterolemia. For this study, a total of six models were created using two variable subset selection methods and machine learning algorithms based on the Korea National Health and Nutrition Examination Survey (KNHANES) data. Among the various body fat mass variables, we found that trunk fat mass was the best variable for predicting hypercholesterolemia. Furthermore, we obtained the area under the receiver operating characteristic curve value of 0.739 and the Matthews correlation coefficient value of 0.36 in the model using the correlation-based feature subset selection and naive Bayes algorithm. Our findings are expected to be used as important information in the field of disease prediction in large-scale screening and public health research.

Cross-Project Pooling of Defects for Handling Class Imbalance

  • Catherine, J.M.;Djodilatchoumy, S
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.10
    • /
    • pp.11-16
    • /
    • 2022
  • Applying predictive analytics to predict software defects has improved the overall quality and decreased maintenance costs. Many supervised and unsupervised learning algorithms have been used for defect prediction on publicly available datasets. Most of these datasets suffer from an imbalance in the output classes. We study the impact of class imbalance in the defect datasets on the efficiency of the defect prediction model and propose a CPP method for handling imbalances in the dataset. The performance of the methods is evaluated using measures like Matthew's Correlation Coefficient (MCC), Recall, and Accuracy measures. The proposed sampling technique shows significant improvement in the efficiency of the classifier in predicting defects.

Optimal threshold using the correlation coefficient for the confusion matrix (혼동행렬의 상관계수를 이용한 최적분류점)

  • Hong, Chong Sun;Oh, Se Hyeon;Choi, Ye Won
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.1
    • /
    • pp.77-91
    • /
    • 2022
  • The optimal threshold estimation is considered in order to discriminate the mixture distribution in the fields of Biostatistics and credit evaluation. There exists well-known various accuracy measures that examine the discriminant power. Recently, Matthews correlation coefficient and the F1 statistic were studied to estimate optimal thresholds. In this study, we explore whether these accuracy measures are appropriate for the optimal threshold to discriminate the mixture distribution. It is found that some accuracy measures that depend on the sample size are not appropriate when two sample sizes are much different. Moreover, an alternative method for finding the optimal threshold is proposed using the correlation coefficient that defines the ratio of the confusion matrix, and the usefulness and utility of this method are also discusses.

Diffusion Characteristics of Fatty Acid using Supercritical Fluid Chromatographic Method (초임계유체 크로마토그래피를 이용한 지방산의 확산특성 해석)

  • Lee, Seung Bum;Seong, Dae Hyung;Kim, Hyung Su;Hong, In Kwon
    • Applied Chemistry for Engineering
    • /
    • v.7 no.6
    • /
    • pp.1043-1052
    • /
    • 1996
  • Supercritical fluid chromatographic method was recommended as an alternative separation method of fatty acids of the conventional method such as distillation or extraction. Although diffusion characteristics are varied by the carbon numbers and the degree of unsaturation of fatty acids, the quantitative data were so rare that the commercialization of supercritical fluid chromatographic method has been hindered. In this study, diffusion coefficients of fatty acids which are differently unsaturated are measured by CPB method in the range of 308.15K to 328.15K and 13MPa to 17MPa in supercritical carbon dioxide. A decrease in the binary diffusion coefficient was observed with an increase in temperature and pressure. Also, the decrease in the binary diffusion coefficient with increasing fluid density and viscosity. Wilke-Chang equation, Funazukuri empirical equation, and Matthews-Akgerman equation are used to correlate the experimental diffusion coefficients of fatty acids in supercritical carbon dioxide. Among the various theoretical equations, Matthews-Akgerman equation based on RHS theory was suggested as a more successful correlation model with experimental data.

  • PDF

Development of Image-based System for Multiple Fluorescence Imaging Study (다중형광영상 연구를 위한 영상기반 시스템 개발)

  • Yoon, WoongBae;Kim, Hong Rae;Lee, Hyun Min;Kim, Young Jae;Kim, Kwang Gi;Yoo, Heon;Lee, Seung Hoon
    • Journal of Korea Multimedia Society
    • /
    • v.18 no.12
    • /
    • pp.1445-1452
    • /
    • 2015
  • In these days, fluorescent materials such as ICG or 5-ALA is used for the brain surgery. The patients who underwent brain tumor surgery has been increased during last 30 years and the survivorship rate increased 22∼33% in 5 years. Recently, the Fluorescence induction surgery is developed for more safety and improved the resection rate for the glioma in the neurosurgery field. In this study, we proposed fluorescence area detection method for ICG and 5-ALA fluorescence induced surgery using acquired images from image processing. Accuracy was 99.21% from ICG images, and 99.51% from 5-ALA images. Matthews correlation coefficient was 88.67% from ICG images, and 90.49% from 5-ALA images.

Evaluation of Machine Learning Algorithm Utilization for Lung Cancer Classification Based on Gene Expression Levels

  • Podolsky, Maxim D;Barchuk, Anton A;Kuznetcov, Vladimir I;Gusarova, Natalia F;Gaidukov, Vadim S;Tarakanov, Segrey A
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.17 no.2
    • /
    • pp.835-838
    • /
    • 2016
  • Background: Lung cancer remains one of the most common cancers in the world, both in terms of new cases (about 13% of total per year) and deaths (nearly one cancer death in five), because of the high case fatality. Errors in lung cancer type or malignant growth determination lead to degraded treatment efficacy, because anticancer strategy depends on tumor morphology. Materials and Methods: We have made an attempt to evaluate effectiveness of machine learning algorithms in the task of lung cancer classification based on gene expression levels. We processed four publicly available data sets. The Dana-Farber Cancer Institute data set contains 203 samples and the task was to classify four cancer types and sound tissue samples. With the University of Michigan data set of 96 samples, the task was to execute a binary classification of adenocarcinoma and non-neoplastic tissues. The University of Toronto data set contains 39 samples and the task was to detect recurrence, while with the Brigham and Women's Hospital data set of 181 samples it was to make a binary classification of malignant pleural mesothelioma and adenocarcinoma. We used the k-nearest neighbor algorithm (k=1, k=5, k=10), naive Bayes classifier with assumption of both a normal distribution of attributes and a distribution through histograms, support vector machine and C4.5 decision tree. Effectiveness of machine learning algorithms was evaluated with the Matthews correlation coefficient. Results: The support vector machine method showed best results among data sets from the Dana-Farber Cancer Institute and Brigham and Women's Hospital. All algorithms with the exception of the C4.5 decision tree showed maximum potential effectiveness in the University of Michigan data set. However, the C4.5 decision tree showed best results for the University of Toronto data set. Conclusions: Machine learning algorithms can be used for lung cancer morphology classification and similar tasks based on gene expression level evaluation.

An effective automated ontology construction based on the agriculture domain

  • Deepa, Rajendran;Vigneshwari, Srinivasan
    • ETRI Journal
    • /
    • v.44 no.4
    • /
    • pp.573-587
    • /
    • 2022
  • The agricultural sector is completely different from other sectors since it completely relies on various natural and climatic factors. Climate changes have many effects, including lack of annual rainfall and pests, heat waves, changes in sea level, and global ozone/atmospheric CO2 fluctuation, on land and agriculture in similar ways. Climate change also affects the environment. Based on these factors, farmers chose their crops to increase productivity in their fields. Many existing agricultural ontologies are either domain-specific or have been created with minimal vocabulary and no proper evaluation framework has been implemented. A new agricultural ontology focused on subdomains is designed to assist farmers using Jaccard relative extractor (JRE) and Naïve Bayes algorithm. The JRE is used to find the similarity between two sentences and words in the agricultural documents and the relationship between two terms is identified via the Naïve Bayes algorithm. In the proposed method, the preprocessing of data is carried out through natural language processing techniques and the tags whose dimensions are reduced are subjected to rule-based formal concept analysis and mapping. The subdomain ontologies of weather, pest, and soil are built separately, and the overall agricultural ontology are built around them. The gold standard for the lexical layer is used to evaluate the proposed technique, and its performance is analyzed by comparing it with different state-of-the-art systems. Precision, recall, F-measure, Matthews correlation coefficient, receiver operating characteristic curve area, and precision-recall curve area are the performance metrics used to analyze the performance. The proposed methodology gives a precision score of 94.40% when compared with the decision tree(83.94%) and K-nearest neighbor algorithm(86.89%) for agricultural ontology construction.