• Title/Summary/Keyword: Precision-Recall Curve

Search Result 30, Processing Time 0.027 seconds

Validation Technique for Class Name Postfixes Based on the Machine Learning of Class Properties (클래스 특성 기계학습에 기반한 클래스 이름의 접미사 검증 기법)

  • Lee, Hongseok;Lee, Junha;Lee, Illo;Park, Soojin;Park, Sooyong
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.4 no.6
    • /
    • pp.247-252
    • /
    • 2015
  • As software has gotten bigger in magnitude and the complexity of software has been increased, the maintenance has gained in-creasing attention for its significant impact on the cost. Identifiers have an impact on more than 90 percent of the readability which accounts for a majority portion of the maintenance activities. For this reason, the existing works focus on domain-specific features based on identifiers. However, their approaches have a limitation when either a class name does not reflect the intention of its context or a class naming is incorrect. Therefore, this paper suggests a series of class name validation process by extracting properties of classes, building learning model by applying a decision tree technique of machine learning, and generating a validation report containing the list of recommendable postfixes of classes to be validated. To evaluate this, four open source projects are selected and indicators such as precision, recall, and ROC curve present the value of this work when it comes to five specific postfixes including functional information on class names.

Classification of Anteroposterior/Lateral Images and Segmentation of the Radius Using Deep Learning in Wrist X-rays Images (손목 관절 단순 방사선 영상에서 딥 러닝을 이용한 전후방 및 측면 영상 분류와 요골 영역 분할)

  • Lee, Gi Pyo;Kim, Young Jae;Lee, Sanglim;Kim, Kwang Gi
    • Journal of Biomedical Engineering Research
    • /
    • v.41 no.2
    • /
    • pp.94-100
    • /
    • 2020
  • The purpose of this study was to present the models for classifying the wrist X-ray images by types and for segmenting the radius automatically in each image using deep learning and to verify the learned models. The data were a total of 904 wrist X-rays with the distal radius fracture, consisting of 472 anteroposterior (AP) and 432 lateral images. The learning model was the ResNet50 model for AP/lateral image classification, and the U-Net model for segmentation of the radius. In the model for AP/lateral image classification, 100.0% was showed in precision, recall, and F1 score and area under curve (AUC) was 1.0. The model for segmentation of the radius showed an accuracy of 99.46%, a sensitivity of 89.68%, a specificity of 99.72%, and a Dice similarity coefficient of 90.05% in AP images and an accuracy of 99.37%, a sensitivity of 88.65%, a specificity of 99.69%, and a Dice similarity coefficient of 86.05% in lateral images. The model for AP/lateral classification and the segmentation model of the radius learned through deep learning showed favorable performances to expect clinical application.

Sasang Constitution Classification using Convolutional Neural Network on Facial Images (콘볼루션 신경망 기반의 안면영상을 이용한 사상체질 분류)

  • Ahn, Ilkoo;Kim, Sang-Hyuk;Jeong, Kyoungsik;Kim, Hoseok;Lee, Siwoo
    • Journal of Sasang Constitutional Medicine
    • /
    • v.34 no.3
    • /
    • pp.31-40
    • /
    • 2022
  • Objectives Sasang constitutional medicine is a traditional Korean medicine that classifies humans into four constitutions in consideration of individual differences in physical, psychological, and physiological characteristics. In this paper, we proposed a method to classify Taeeum person (TE) and Non-Taeeum person (NTE), Soeum person (SE) and Non-Soeum person (NSE), and Soyang person (ST) and Non-Soyang person (NSY) using a convolutional neural network with only facial images. Methods Based on the convolutional neural network VGG16 architecture, transfer learning is carried out on the facial images of 3738 subjects to classify TE and NTE, SE and NSE, and SY and NSY. Data augmentation techniques are used to increase classification performance. Results The classification performance of TE and NTE, SE and NSE, and SY and NSY was 77.24%, 85.17%, and 80.18% by F1 score and 80.02%, 85.96%, and 72.76% by Precision-Recall AUC (Area Under the receiver operating characteristic Curve) respectively. Conclusions It was found that Soeum person had the most heterogeneous facial features as it had the best classification performance compared to the rest of the constitution, followed by Taeeum person and Soyang person. The experimental results showed that there is a possibility to classify constitutions only with facial images. The performance is expected to increase with additional data such as BMI or personality questionnaire.

Link Prediction in Bipartite Network Using Composite Similarities

  • Bijay Gaudel;Deepanjal Shrestha;Niosh Basnet;Neesha Rajkarnikar;Seung Ryul Jeong;Donghai Guan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.8
    • /
    • pp.2030-2052
    • /
    • 2023
  • Analysis of a bipartite (two-mode) network is a significant research area to understand the formation of social communities, economic systems, drug side effect topology, etc. in complex information systems. Most of the previous works talk about a projection-based model or latent feature model, which predicts the link based on singular similarity. The projection-based models suffer from the loss of structural information in the projected network and the latent feature is hardly present. This work proposes a novel method for link prediction in the bipartite network based on an ensemble of composite similarities, overcoming the issues of model-based and latent feature models. The proposed method analyzes the structure, neighborhood nodes as well as latent attributes between the nodes to predict the link in the network. To illustrate the proposed method, experiments are performed with five real-world data sets and compared with various state-of-art link prediction methods and it is inferred that this method outperforms with ~3% to ~9% higher using area under the precision-recall curve (AUC-PR) measure. This work holds great significance in the study of biological networks, e-commerce networks, complex web-based systems, networks of drug binding, enzyme protein, and other related networks in understanding the formation of such complex networks. Further, this study helps in link prediction and its usability for different purposes ranging from building intelligent systems to providing services in big data and web-based systems.

Fashion Category Oversampling Automation System

  • Minsun Yeu;Do Hyeok Yoo;SuJin Bak
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.1
    • /
    • pp.31-40
    • /
    • 2024
  • In the realm of domestic online fashion platform industry the manual registration of product information by individual business owners leads to inconvenience and reliability issues, especially when dealing with simultaneous registrations of numerous product groups. Moreover, bias is significantly heightened due to the low quality of product images and an imbalance in data quantity. Therefore, this study proposes a ResNet50 model aimed at minimizing data bias through oversampling techniques and conducting multiple classifications for 13 fashion categories. Transfer learning is employed to optimize resource utilization and reduce prolonged learning times. The results indicate improved discrimination of up to 33.4% for data augmentation in classes with insufficient data compared to the basic convolution neural network (CNN) model. The reliability of all outcomes is underscored by precision and affirmed by the recall curve. This study is suggested to advance the development of the domestic online fashion platform industry to a higher echelon.

Diagnostic Performance of a New Convolutional Neural Network Algorithm for Detecting Developmental Dysplasia of the Hip on Anteroposterior Radiographs

  • Hyoung Suk Park;Kiwan Jeon;Yeon Jin Cho;Se Woo Kim;Seul Bi Lee;Gayoung Choi;Seunghyun Lee;Young Hun Choi;Jung-Eun Cheon;Woo Sun Kim;Young Jin Ryu;Jae-Yeon Hwang
    • Korean Journal of Radiology
    • /
    • v.22 no.4
    • /
    • pp.612-623
    • /
    • 2021
  • Objective: To evaluate the diagnostic performance of a deep learning algorithm for the automated detection of developmental dysplasia of the hip (DDH) on anteroposterior (AP) radiographs. Materials and Methods: Of 2601 hip AP radiographs, 5076 cropped unilateral hip joint images were used to construct a dataset that was further divided into training (80%), validation (10%), or test sets (10%). Three radiologists were asked to label the hip images as normal or DDH. To investigate the diagnostic performance of the deep learning algorithm, we calculated the receiver operating characteristics (ROC), precision-recall curve (PRC) plots, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) and compared them with the performance of radiologists with different levels of experience. Results: The area under the ROC plot generated by the deep learning algorithm and radiologists was 0.988 and 0.988-0.919, respectively. The area under the PRC plot generated by the deep learning algorithm and radiologists was 0.973 and 0.618-0.958, respectively. The sensitivity, specificity, PPV, and NPV of the proposed deep learning algorithm were 98.0, 98.1, 84.5, and 99.8%, respectively. There was no significant difference in the diagnosis of DDH by the algorithm and the radiologist with experience in pediatric radiology (p = 0.180). However, the proposed model showed higher sensitivity, specificity, and PPV, compared to the radiologist without experience in pediatric radiology (p < 0.001). Conclusion: The proposed deep learning algorithm provided an accurate diagnosis of DDH on hip radiographs, which was comparable to the diagnosis by an experienced radiologist.

Network Anomaly Detection Technologies Using Unsupervised Learning AutoEncoders (비지도학습 오토 엔코더를 활용한 네트워크 이상 검출 기술)

  • Kang, Koohong
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.30 no.4
    • /
    • pp.617-629
    • /
    • 2020
  • In order to overcome the limitations of the rule-based intrusion detection system due to changes in Internet computing environments, the emergence of new services, and creativity of attackers, network anomaly detection (NAD) using machine learning and deep learning technologies has received much attention. Most of these existing machine learning and deep learning technologies for NAD use supervised learning methods to learn a set of training data set labeled 'normal' and 'attack'. This paper presents the feasibility of the unsupervised learning AutoEncoder(AE) to NAD from data sets collecting of secured network traffic without labeled responses. To verify the performance of the proposed AE mode, we present the experimental results in terms of accuracy, precision, recall, f1-score, and ROC AUC value on the NSL-KDD training and test data sets. In particular, we model a reference AE through the deep analysis of diverse AEs varying hyper-parameters such as the number of layers as well as considering the regularization and denoising effects. The reference model shows the f1-scores 90.4% and 89% of binary classification on the KDDTest+ and KDDTest-21 test data sets based on the threshold of the 82-th percentile of the AE reconstruction error of the training data set.

Machine Learning Model to Predict Osteoporotic Spine with Hounsfield Units on Lumbar Computed Tomography

  • Nam, Kyoung Hyup;Seo, Il;Kim, Dong Hwan;Lee, Jae Il;Choi, Byung Kwan;Han, In Ho
    • Journal of Korean Neurosurgical Society
    • /
    • v.62 no.4
    • /
    • pp.442-449
    • /
    • 2019
  • Objective : Bone mineral density (BMD) is an important consideration during fusion surgery. Although dual X-ray absorptiometry is considered as the gold standard for assessing BMD, quantitative computed tomography (QCT) provides more accurate data in spine osteoporosis. However, QCT has the disadvantage of additional radiation hazard and cost. The present study was to demonstrate the utility of artificial intelligence and machine learning algorithm for assessing osteoporosis using Hounsfield units (HU) of preoperative lumbar CT coupling with data of QCT. Methods : We reviewed 70 patients undergoing both QCT and conventional lumbar CT for spine surgery. The T-scores of 198 lumbar vertebra was assessed in QCT and the HU of vertebral body at the same level were measured in conventional CT by the picture archiving and communication system (PACS) system. A multiple regression algorithm was applied to predict the T-score using three independent variables (age, sex, and HU of vertebral body on conventional CT) coupling with T-score of QCT. Next, a logistic regression algorithm was applied to predict osteoporotic or non-osteoporotic vertebra. The Tensor flow and Python were used as the machine learning tools. The Tensor flow user interface developed in our institute was used for easy code generation. Results : The predictive model with multiple regression algorithm estimated similar T-scores with data of QCT. HU demonstrates the similar results as QCT without the discordance in only one non-osteoporotic vertebra that indicated osteoporosis. From the training set, the predictive model classified the lumbar vertebra into two groups (osteoporotic vs. non-osteoporotic spine) with 88.0% accuracy. In a test set of 40 vertebrae, classification accuracy was 92.5% when the learning rate was 0.0001 (precision, 0.939; recall, 0.969; F1 score, 0.954; area under the curve, 0.900). Conclusion : This study is a simple machine learning model applicable in the spine research field. The machine learning model can predict the T-score and osteoporotic vertebrae solely by measuring the HU of conventional CT, and this would help spine surgeons not to under-estimate the osteoporotic spine preoperatively. If applied to a bigger data set, we believe the predictive accuracy of our model will further increase. We propose that machine learning is an important modality of the medical research field.

Development and Validation of MRI-Based Radiomics Models for Diagnosing Juvenile Myoclonic Epilepsy

  • Kyung Min Kim;Heewon Hwang;Beomseok Sohn;Kisung Park;Kyunghwa Han;Sung Soo Ahn;Wonwoo Lee;Min Kyung Chu;Kyoung Heo;Seung-Koo Lee
    • Korean Journal of Radiology
    • /
    • v.23 no.12
    • /
    • pp.1281-1289
    • /
    • 2022
  • Objective: Radiomic modeling using multiple regions of interest in MRI of the brain to diagnose juvenile myoclonic epilepsy (JME) has not yet been investigated. This study aimed to develop and validate radiomics prediction models to distinguish patients with JME from healthy controls (HCs), and to evaluate the feasibility of a radiomics approach using MRI for diagnosing JME. Materials and Methods: A total of 97 JME patients (25.6 ± 8.5 years; female, 45.5%) and 32 HCs (28.9 ± 11.4 years; female, 50.0%) were randomly split (7:3 ratio) into a training (n = 90) and a test set (n = 39) group. Radiomic features were extracted from 22 regions of interest in the brain using the T1-weighted MRI based on clinical evidence. Predictive models were trained using seven modeling methods, including a light gradient boosting machine, support vector classifier, random forest, logistic regression, extreme gradient boosting, gradient boosting machine, and decision tree, with radiomics features in the training set. The performance of the models was validated and compared to the test set. The model with the highest area under the receiver operating curve (AUROC) was chosen, and important features in the model were identified. Results: The seven tested radiomics models, including light gradient boosting machine, support vector classifier, random forest, logistic regression, extreme gradient boosting, gradient boosting machine, and decision tree, showed AUROC values of 0.817, 0.807, 0.783, 0.779, 0.767, 0.762, and 0.672, respectively. The light gradient boosting machine with the highest AUROC, albeit without statistically significant differences from the other models in pairwise comparisons, had accuracy, precision, recall, and F1 scores of 0.795, 0.818, 0.931, and 0.871, respectively. Radiomic features, including the putamen and ventral diencephalon, were ranked as the most important for suggesting JME. Conclusion: Radiomic models using MRI were able to differentiate JME from HCs.

A Study of Anomaly Detection for ICT Infrastructure using Conditional Multimodal Autoencoder (ICT 인프라 이상탐지를 위한 조건부 멀티모달 오토인코더에 관한 연구)

  • Shin, Byungjin;Lee, Jonghoon;Han, Sangjin;Park, Choong-Shik
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.57-73
    • /
    • 2021
  • Maintenance and prevention of failure through anomaly detection of ICT infrastructure is becoming important. System monitoring data is multidimensional time series data. When we deal with multidimensional time series data, we have difficulty in considering both characteristics of multidimensional data and characteristics of time series data. When dealing with multidimensional data, correlation between variables should be considered. Existing methods such as probability and linear base, distance base, etc. are degraded due to limitations called the curse of dimensions. In addition, time series data is preprocessed by applying sliding window technique and time series decomposition for self-correlation analysis. These techniques are the cause of increasing the dimension of data, so it is necessary to supplement them. The anomaly detection field is an old research field, and statistical methods and regression analysis were used in the early days. Currently, there are active studies to apply machine learning and artificial neural network technology to this field. Statistically based methods are difficult to apply when data is non-homogeneous, and do not detect local outliers well. The regression analysis method compares the predictive value and the actual value after learning the regression formula based on the parametric statistics and it detects abnormality. Anomaly detection using regression analysis has the disadvantage that the performance is lowered when the model is not solid and the noise or outliers of the data are included. There is a restriction that learning data with noise or outliers should be used. The autoencoder using artificial neural networks is learned to output as similar as possible to input data. It has many advantages compared to existing probability and linear model, cluster analysis, and map learning. It can be applied to data that does not satisfy probability distribution or linear assumption. In addition, it is possible to learn non-mapping without label data for teaching. However, there is a limitation of local outlier identification of multidimensional data in anomaly detection, and there is a problem that the dimension of data is greatly increased due to the characteristics of time series data. In this study, we propose a CMAE (Conditional Multimodal Autoencoder) that enhances the performance of anomaly detection by considering local outliers and time series characteristics. First, we applied Multimodal Autoencoder (MAE) to improve the limitations of local outlier identification of multidimensional data. Multimodals are commonly used to learn different types of inputs, such as voice and image. The different modal shares the bottleneck effect of Autoencoder and it learns correlation. In addition, CAE (Conditional Autoencoder) was used to learn the characteristics of time series data effectively without increasing the dimension of data. In general, conditional input mainly uses category variables, but in this study, time was used as a condition to learn periodicity. The CMAE model proposed in this paper was verified by comparing with the Unimodal Autoencoder (UAE) and Multi-modal Autoencoder (MAE). The restoration performance of Autoencoder for 41 variables was confirmed in the proposed model and the comparison model. The restoration performance is different by variables, and the restoration is normally well operated because the loss value is small for Memory, Disk, and Network modals in all three Autoencoder models. The process modal did not show a significant difference in all three models, and the CPU modal showed excellent performance in CMAE. ROC curve was prepared for the evaluation of anomaly detection performance in the proposed model and the comparison model, and AUC, accuracy, precision, recall, and F1-score were compared. In all indicators, the performance was shown in the order of CMAE, MAE, and AE. Especially, the reproduction rate was 0.9828 for CMAE, which can be confirmed to detect almost most of the abnormalities. The accuracy of the model was also improved and 87.12%, and the F1-score was 0.8883, which is considered to be suitable for anomaly detection. In practical aspect, the proposed model has an additional advantage in addition to performance improvement. The use of techniques such as time series decomposition and sliding windows has the disadvantage of managing unnecessary procedures; and their dimensional increase can cause a decrease in the computational speed in inference.The proposed model has characteristics that are easy to apply to practical tasks such as inference speed and model management.