• Title/Summary/Keyword: ROC AUC

Search Result 292, Processing Time 0.023 seconds

A Comparative Study of Phishing Websites Classification Based on Classifier Ensemble

  • Tama, Bayu Adhi;Rhee, Kyung-Hyune
    • Journal of Korea Multimedia Society
    • /
    • v.21 no.5
    • /
    • pp.617-625
    • /
    • 2018
  • Phishing website has become a crucial concern in cyber security applications. It is performed by fraudulently deceiving users with the aim of obtaining their sensitive information such as bank account information, credit card, username, and password. The threat has led to huge losses to online retailers, e-business platform, financial institutions, and to name but a few. One way to build anti-phishing detection mechanism is to construct classification algorithm based on machine learning techniques. The objective of this paper is to compare different classifier ensemble approaches, i.e. random forest, rotation forest, gradient boosted machine, and extreme gradient boosting against single classifiers, i.e. decision tree, classification and regression tree, and credal decision tree in the case of website phishing. Area under ROC curve (AUC) is employed as a performance metric, whilst statistical tests are used as baseline indicator of significance evaluation among classifiers. The paper contributes the existing literature on making a benchmark of classifier ensembles for web phishing detection.

Discriminant analysis using empirical distribution function

  • Kim, Jae Young;Hong, Chong Sun
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.5
    • /
    • pp.1179-1189
    • /
    • 2017
  • In this study, we propose an alternative method for discriminant analysis using a multivariate empirical distribution function to express multivariate data as a simple one-dimensional statistic. This method turns to be the estimation process of the optimal threshold based on classification accuracy measures and an empirical distribution function of data composed of classes. This can also be visually represented on a two-dimensional plane and discussed with some measures in ROC curves, surfaces, and manifolds. In order to explore the usefulness of this method for discriminant analysis in the study, we conducted comparisons between the proposed method and the existing methods through simulations and illustrative examples. It is found that the proposed method may have better performances for some cases.

Time-Frequency Analysis of Electrohysterogram for Classification of Term and Preterm Birth

  • Ryu, Jiwoo;Park, Cheolsoo
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.4 no.2
    • /
    • pp.103-109
    • /
    • 2015
  • In this paper, a novel method for the classification of term and preterm birth is proposed based on time-frequency analysis of electrohysterogram (EHG) using multivariate empirical mode decomposition (MEMD). EHG is a promising study for preterm birth prediction, because it is low-cost and accurate compared to other preterm birth prediction methods, such as tocodynamometry (TOCO). Previous studies on preterm birth prediction applied prefilterings based on Fourier analysis of an EHG, followed by feature extraction and classification, even though Fourier analysis is suboptimal to biomedical signals, such as EHG, because of its nonlinearity and nonstationarity. Therefore, the proposed method applies prefiltering based on MEMD instead of Fourier-based prefilters before extracting the sample entropy feature and classifying the term and preterm birth groups. For the evaluation, the Physionet term-preterm EHG database was used where the proposed method and Fourier prefiltering-based method were adopted for comparative study. The result showed that the area under curve (AUC) of the receiver operating characteristic (ROC) was increased by 0.0351 when MEMD was used instead of the Fourier-based prefilter.

Fault Prediction Using Statistical and Machine Learning Methods for Improving Software Quality

  • Malhotra, Ruchika;Jain, Ankita
    • Journal of Information Processing Systems
    • /
    • v.8 no.2
    • /
    • pp.241-262
    • /
    • 2012
  • An understanding of quality attributes is relevant for the software organization to deliver high software reliability. An empirical assessment of metrics to predict the quality attributes is essential in order to gain insight about the quality of software in the early phases of software development and to ensure corrective actions. In this paper, we predict a model to estimate fault proneness using Object Oriented CK metrics and QMOOD metrics. We apply one statistical method and six machine learning methods to predict the models. The proposed models are validated using dataset collected from Open Source software. The results are analyzed using Area Under the Curve (AUC) obtained from Receiver Operating Characteristics (ROC) analysis. The results show that the model predicted using the random forest and bagging methods outperformed all the other models. Hence, based on these results it is reasonable to claim that quality models have a significant relevance with Object Oriented metrics and that machine learning methods have a comparable performance with statistical methods.

Predicting stock price direction by using data mining methods : Emphasis on comparing single classifiers and ensemble classifiers

  • Eo, Kyun Sun;Lee, Kun Chang
    • Journal of the Korea Society of Computer and Information
    • /
    • v.22 no.11
    • /
    • pp.111-116
    • /
    • 2017
  • This paper proposes a data mining approach to predicting stock price direction. Stock market fluctuates due to many factors. Therefore, predicting stock price direction has become an important issue in the field of stock market analysis. However, in literature, there are few studies applying data mining approaches to predicting the stock price direction. To contribute to literature, this paper proposes comparing single classifiers and ensemble classifiers. Single classifiers include logistic regression, decision tree, neural network, and support vector machine. Ensemble classifiers we consider are adaboost, random forest, bagging, stacking, and vote. For the sake of experiments, we garnered dataset from Korea Stock Exchange (KRX) ranging from 2008 to 2015. Data mining experiments using WEKA revealed that random forest, one of ensemble classifiers, shows best results in terms of metrics such as AUC (area under the ROC curve) and accuracy.

Statistical Fingerprint Recognition Matching Method with an Optimal Threshold and Confidence Interval

  • Hong, C.S.;Kim, C.H.
    • The Korean Journal of Applied Statistics
    • /
    • v.25 no.6
    • /
    • pp.1027-1036
    • /
    • 2012
  • Among various biometrics recognition systems, statistical fingerprint recognition matching methods are considered using minutiae on fingerprints. We define similarity distance measures based on the coordinate and angle of the minutiae, and suggest a fingerprint recognition model following statistical distributions. We could obtain confidence intervals of similarity distance for the same and different persons, and optimal thresholds to minimize two kinds of error rates for distance distributions. It is found that the two confidence intervals of the same and different persons are not overlapped and that the optimal threshold locates between two confidence intervals. Hence an alternative statistical matching method can be suggested by using nonoverlapped confidence intervals and optimal thresholds obtained from the distributions of similarity distances.

A Comparative Study of Phishing Websites Classification Based on Classifier Ensembles

  • Tama, Bayu Adhi;Rhee, Kyung-Hyune
    • Journal of Multimedia Information System
    • /
    • v.5 no.2
    • /
    • pp.99-104
    • /
    • 2018
  • Phishing website has become a crucial concern in cyber security applications. It is performed by fraudulently deceiving users with the aim of obtaining their sensitive information such as bank account information, credit card, username, and password. The threat has led to huge losses to online retailers, e-business platform, financial institutions, and to name but a few. One way to build anti-phishing detection mechanism is to construct classification algorithm based on machine learning techniques. The objective of this paper is to compare different classifier ensemble approaches, i.e. random forest, rotation forest, gradient boosted machine, and extreme gradient boosting against single classifiers, i.e. decision tree, classification and regression tree, and credal decision tree in the case of website phishing. Area under ROC curve (AUC) is employed as a performance metric, whilst statistical tests are used as baseline indicator of significance evaluation among classifiers. The paper contributes the existing literature on making a benchmark of classifier ensembles for web phishing detection.

A Detailed Analysis of Classifier Ensembles for Intrusion Detection in Wireless Network

  • Tama, Bayu Adhi;Rhee, Kyung-Hyune
    • Journal of Information Processing Systems
    • /
    • v.13 no.5
    • /
    • pp.1203-1212
    • /
    • 2017
  • Intrusion detection systems (IDSs) are crucial in this overwhelming increase of attacks on the computing infrastructure. It intelligently detects malicious and predicts future attack patterns based on the classification analysis using machine learning and data mining techniques. This paper is devoted to thoroughly evaluate classifier ensembles for IDSs in IEEE 802.11 wireless network. Two ensemble techniques, i.e. voting and stacking are employed to combine the three base classifiers, i.e. decision tree (DT), random forest (RF), and support vector machine (SVM). We use area under ROC curve (AUC) value as a performance metric. Finally, we conduct two statistical significance tests to evaluate the performance differences among classifiers.

Accuracy Evaluation of Regional Wall Motion Abnormality in Echocardiography and Cardiac Enzymes in the Diagnosis of Ischemic Heart Disease (허혈성심장질환 진단에서 심장초음파의 국소벽운동이상과 심장효소의 정확성 평가)

  • Kim, Hee-Young;Ji, Tae-Jeong
    • Journal of radiological science and technology
    • /
    • v.45 no.4
    • /
    • pp.321-330
    • /
    • 2022
  • Echocardiography and cardiac enzymes test are the tests to assess ischemic heart disease. The purpose of this study was to verify the accuracy by comparing and analyzing two tests for the diagnosis of ischemic heart disease. A retrospective study was conducted on 393 study subjects who underwent echocardiography and cardiac enzymes test. As a result of the study, regional wall motion abnormality (RWMA) increased as the age of the study subjects increased. As a result of ROC analysis, RWMA showed a larger area under the curve (AUC) than cardiac enzymes. RWMA showed the highest accuracy with 81.1% of all cardiac enzymes. Among cardiac enzymes, cTnI showed the highest accuracy. Thus, It was confirmed that RWMA of echocardiography is more accurate than cardiac enzyme is in diagnosing ischemic heart disease.

Detection of outliers in pet sensor data through DASVDD (DASVDD 모형을 통한 반려동물 센서 데이터 이상치 탐지)

  • JeongHyeon Park;JunHyeok Go;SiUng Kim;Nammee Moon
    • Annual Conference of KIPS
    • /
    • 2023.11a
    • /
    • pp.1208-1210
    • /
    • 2023
  • 이상치는 주로 저빈도로 발생하기 때문에, 이상치 탐지 분야에서는 정상 데이터만을 이용한 비지도 기반 학습 모델을 사용하는 방법들이 제안되었다. 따라서, 본 논문에서는 반려동물 센서 데이터를 이용해 비지도 기반 모델인 DASVDD을 활용하여 이상치를 탐지한다. 하지만 데이터셋에 이상치가 존재하지 않아 반려동물이 고빈도로 보여주는 A행동군(서다, 앉다, 엎드리다, 눕다, 걷다), 저빈도로 보여주는 B행동군(킁킁대다, 먹다)으로 분리하여 학습을 진행한다. 모델의 성능은 ROC-AUC을 기준으로 79.05%의 성능을 보여주는 것을 확인하였다.