• 제목/요약/키워드: Precision-recall curve

검색결과 38건 처리시간 0.03초

Sentiment Analysis From Images - Comparative Study of SAI-G and SAI-C Models' Performances Using AutoML Vision Service from Google Cloud and Clarifai Platform

  • Marcu, Daniela;Danubianu, Mirela
    • International Journal of Computer Science & Network Security
    • /
    • 제21권9호
    • /
    • pp.179-184
    • /
    • 2021
  • In our study we performed a sentiments analysis from the images. For this purpose, we used 153 images that contain: people, animals, buildings, landscapes, cakes and objects that we divided into two categories: images that suggesting a positive or a negative emotion. In order to classify the images using the two categories, we created two models. The SAI-G model was created with Google's AutoML Vision service. The SAI-C model was created on the Clarifai platform. The data were labeled in a preprocessing stage, and for the SAI-C model we created the concepts POSITIVE (POZITIV) AND NEGATIVE (NEGATIV). In order to evaluate the performances of the two models, we used a series of evaluation metrics such as: Precision, Recall, ROC (Receiver Operating Characteristic) curve, Precision-Recall curve, Confusion Matrix, Accuracy Score and Average precision. Precision and Recall for the SAI-G model is 0.875, at a confidence threshold of 0.5, while for the SAI-C model we obtained much lower scores, respectively Precision = 0.727 and Recall = 0.571 for the same confidence threshold. The results indicate a lower classification performance of the SAI-C model compared to the SAI-G model. The exception is the value of Precision for the POSITIVE concept, which is 1,000.

정보검색효율에 관한 연구 (A Study on the Effectiveness of Information Retrieval)

  • 윤구호
    • 한국문헌정보학회지
    • /
    • 제8권
    • /
    • pp.73-101
    • /
    • 1981
  • Retrieval effectiveness is the principal criterion for measuring the performance of an information retrieval system. The effectiveness of a retrieval system depends primarily on the extent to which it can retrieve wanted documents without retrieving unwanted ones. So, ultimately, effectiveness is a function of the relevant and nonrelevant documents retrieved. Consequently, 'relevance' of information to the user's request has become one of the most fundamental concept encountered in the theory of information retrieval. Although there is at present no consensus as to how this notion should be defined, relevance has been widely used as a meaningful quantity and an adequate criterion for measures of the evaluation of retrieval effectiveness. The recall and precision among various parameters based on the 'two-by-two' table (or, contingency table) were major considerations in this paper, because it is assumed that recall and precision are sufficient for the measurement of effectiveness. Accordingly, different concepts of 'relevance' and 'pertinence' of documents to user requests and their proper usages were investigated even though the two terms have unfortunately been used rather loosely in the literature. In addition, a number of variables affecting the recall and precision values were discussed. Some conclusions derived from this study are as follows: Any notion of retrieval effectiveness is based on 'relevance' which itself is extremely difficult to define. Recall and precision are valuable concepts in the study of any information retrieval system. They are, however, not the only criteria by which a system may be judged. The recall-precision curve represents the average performance of any given system, and this may vary quite considerably in particular situations. Therefore, it is possible to some extent to vary the indexing policy, the indexing policy, the indexing language, or the search methodology to improve the performance of the system in terms of recall and precision. The 'inverse relationship' between average recall and precision could be accepted as the 'fundamental law of retrieval', and it should certainly be used as an aid to evaluation. Finally, there is a limit to the performance(in terms of effectiveness) achievable by an information retrieval system. That is : "Perfect retrieval is impossible."

  • PDF

대형 해파리(Nemopilema nomurai) 탐지를 위한 머신러닝 기반 데이터 구축 및 모델 평가 (Machine Learning-based Data Construction and Model Evaluation for Monitoring of Giant Jellyfish Nemopilema nomurai)

  • 오선영;김형태;이경훈
    • 한국수산과학회지
    • /
    • 제57권5호
    • /
    • pp.581-588
    • /
    • 2024
  • In this, we study developed a machine-learning system that can effectively detect giant jellyfish Nemopilema nomurai by collecting videos of their appearances. Surveys were conducted in the East China Sea, South Sea, and Jeju coastal waters, which are presumed to be jellyfish migration routes. Video data were collected using GoPro cameras, and images were extracted at 1 fps to train the YOLOv8 Nano and Medium models. The YOLOv8 Nano model achieved an F1 score of 0.83 with high confidence and maintained high precision in the precision-recall curve, demonstrating its effectiveness in predicting jellyfish occurrences. The YOLOv8 nano model demonstrated excellent reliability and precision, indicating its potential for effective jellyfish detection. However, to improve the performance of the model even further, data from various environments must be collected and additional validations must be performed.

Enhanced Network Intrusion Detection using Deep Convolutional Neural Networks

  • Naseer, Sheraz;Saleem, Yasir
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제12권10호
    • /
    • pp.5159-5178
    • /
    • 2018
  • Network Intrusion detection is a rapidly growing field of information security due to its importance for modern IT infrastructure. Many supervised and unsupervised learning techniques have been devised by researchers from discipline of machine learning and data mining to achieve reliable detection of anomalies. In this paper, a deep convolutional neural network (DCNN) based intrusion detection system (IDS) is proposed, implemented and analyzed. Deep CNN core of proposed IDS is fine-tuned using Randomized search over configuration space. Proposed system is trained and tested on NSLKDD training and testing datasets using GPU. Performance comparisons of proposed DCNN model are provided with other classifiers using well-known metrics including Receiver operating characteristics (RoC) curve, Area under RoC curve (AuC), accuracy, precision-recall curve and mean average precision (mAP). The experimental results of proposed DCNN based IDS shows promising results for real world application in anomaly detection systems.

검색효율 측정척도에 관한 연구 (A Study on measuring techniques of retrieval effectiveness)

  • 윤구호
    • 한국문헌정보학회지
    • /
    • 제16권
    • /
    • pp.177-205
    • /
    • 1989
  • Retrieval effectiveness is the principal criteria for measuring the performance of an information retrieval system. This paper deals with the characteristics of 'relevance' of information and various measuring techniques of retrieval effectivess. The outlines of this study are as follows: 1) Relevance decision for evaluation should be devided into the user-oriented and the system-oriented decisions. 2) The recall-precision measure seems to be user-oriented, and the recall-fallout measure to be system-oriented. 3) Many of composite measures can not be justified III any rational manner unfortunately. 4) The Swets model has demonstrated that it yields, in general, a straight line instead of a curve of varying curvature and emphasized the fundamentally probabilistic nature of information retrieval. 5) The Cooper model seems to be a good substitute for precision and a useful measure for systems which ranked documents. 6) The Rocchio model were proposed for the evaluation of retreval systems which ranked documents, and were designed to be independent of cut-off. 7) The Cawkell model suggested that the Shannon's equation for entropy can be applied to measuring of retrieval effectiveness.

  • PDF

Comparison of Heart Failure Prediction Performance Using Various Machine Learning Techniques

  • ByungJoo Kim
    • International Journal of Internet, Broadcasting and Communication
    • /
    • 제16권4호
    • /
    • pp.290-300
    • /
    • 2024
  • This study presents a comprehensive evaluation of various machine learning models for predicting heart failure outcomes. Leveraging a data set of clinical records, the performance of Logistic Regression, Support Vector Machine (SVM), Random Forest, Soft Voting ensemble, and XGBoost models are rigorously assessed using multiple evaluation metrics, including accuracy, precision, recall, F1-score, and area under the receiver operating characteristic curve (AUC). The analysis reveals that the XGBoost model outperforms the other techniques across all metrics, exhibiting the highest AUC score, indicating superior discriminative ability in distinguishing between patients with and without heart failure. Furthermore, the study highlights the importance of feature importance analysis provided by XGBoost, offering valuable insights into the most influential predictors of heart failure, which can inform clinical decision-making and patient management strategies. The research also underscores the significance of balancing precision and recall, as reflected by the F1-score, in medical applications to minimize the consequences of false negatives.

면 객체 매칭을 위한 판별모델의 성능 평가 (Evaluation of Classifiers Performance for Areal Features Matching)

  • 김지영;김정옥;유기윤;허용
    • 한국측량학회지
    • /
    • 제31권1호
    • /
    • pp.49-55
    • /
    • 2013
  • 데이터마이닝과 바이오인식 분야의 판별모델의 성능평가 방법을 이종의 공간 데이터 셋의 매칭에 적용함으로써 좋은 매칭결과를 보이는 판별모델을 도출하고자 한다. 이를 위하여 매칭 기준별 매칭 후보객체 쌍의 거리 값을 구하고, 이들 거리 값을 Min-Max 방법과 Tanh 방법으로 정규화하여 유사도를 산출한다. 산출된 유사도를 CRITIC 방법, Matcher Weighting 방법 그리고 Simple Sum 방법으로 결합하여 형상유사도를 도출하는 판별모델을 적용하였다. 각 판별모델을 PR곡선과 AUC-PR로 평가한 결과, Tanh 정규화와 Simple Sum 방법을 적용한 판별모델의 AUC-PR이 0.893으로 가장 높게 나타났다. 따라서 이종의 공간 데이터 셋의 매칭을 위해서는 Tanh 정규화를 이용하여 각 매칭기준별 유사도를 산출하고 Simple Sum 방법으로 형상유사도를 구하는 판별모델이 적합한 것으로 사료된다.

An effective automated ontology construction based on the agriculture domain

  • Deepa, Rajendran;Vigneshwari, Srinivasan
    • ETRI Journal
    • /
    • 제44권4호
    • /
    • pp.573-587
    • /
    • 2022
  • The agricultural sector is completely different from other sectors since it completely relies on various natural and climatic factors. Climate changes have many effects, including lack of annual rainfall and pests, heat waves, changes in sea level, and global ozone/atmospheric CO2 fluctuation, on land and agriculture in similar ways. Climate change also affects the environment. Based on these factors, farmers chose their crops to increase productivity in their fields. Many existing agricultural ontologies are either domain-specific or have been created with minimal vocabulary and no proper evaluation framework has been implemented. A new agricultural ontology focused on subdomains is designed to assist farmers using Jaccard relative extractor (JRE) and Naïve Bayes algorithm. The JRE is used to find the similarity between two sentences and words in the agricultural documents and the relationship between two terms is identified via the Naïve Bayes algorithm. In the proposed method, the preprocessing of data is carried out through natural language processing techniques and the tags whose dimensions are reduced are subjected to rule-based formal concept analysis and mapping. The subdomain ontologies of weather, pest, and soil are built separately, and the overall agricultural ontology are built around them. The gold standard for the lexical layer is used to evaluate the proposed technique, and its performance is analyzed by comparing it with different state-of-the-art systems. Precision, recall, F-measure, Matthews correlation coefficient, receiver operating characteristic curve area, and precision-recall curve area are the performance metrics used to analyze the performance. The proposed methodology gives a precision score of 94.40% when compared with the decision tree(83.94%) and K-nearest neighbor algorithm(86.89%) for agricultural ontology construction.

요추 특징점 추출을 위한 영역 분할 모델의 성능 비교 분석 (A Comparative Performance Analysis of Segmentation Models for Lumbar Key-points Extraction)

  • 유승희;최민호 ;장준수
    • 대한의용생체공학회:의공학회지
    • /
    • 제44권5호
    • /
    • pp.354-361
    • /
    • 2023
  • Most of spinal diseases are diagnosed based on the subjective judgment of a specialist, so numerous studies have been conducted to find objectivity by automating the diagnosis process using deep learning. In this paper, we propose a method that combines segmentation and feature extraction, which are frequently used techniques for diagnosing spinal diseases. Four models, U-Net, U-Net++, DeepLabv3+, and M-Net were trained and compared using 1000 X-ray images, and key-points were derived using Douglas-Peucker algorithms. For evaluation, Dice Similarity Coefficient(DSC), Intersection over Union(IoU), precision, recall, and area under precision-recall curve evaluation metrics were used and U-Net++ showed the best performance in all metrics with an average DSC of 0.9724. For the average Euclidean distance between estimated key-points and ground truth, U-Net was the best, followed by U-Net++. However the difference in average distance was about 0.1 pixels, which is not significant. The results suggest that it is possible to extract key-points based on segmentation and that it can be used to accurately diagnose various spinal diseases, including spondylolisthesis, with consistent criteria.

클래스 불균형 문제에서 베이지안 알고리즘의 학습 행위 분석 (Learning Behavior Analysis of Bayesian Algorithm Under Class Imbalance Problems)

  • 황두성
    • 전자공학회논문지CI
    • /
    • 제45권6호
    • /
    • pp.179-186
    • /
    • 2008
  • 본 논문에서는 베이지안 알고리즘이 불균형 데이터의 학습 시 나타나는 현상을 분석하고 성능 평가 방법을 비교하였다. 사전 데이터 분포를 가정하고 불균형 데이터 비율과 분류 복잡도에 따라 발생된 분류 문제에 대해 베이지안 학습을 수행하였다. 실험 결과는 ROC(Receiver Operator Characteristic)와 PR(Precision-Recall) 평가 방법의 AUC(Area Under the Curve)를 계사하여 불균형 데이터 비율과 분류 복잡도에 따라 분석되었다. 비교 분석에서 불균형 비율은 기 수행된 연구 결과와 같이 베이지안 학습에 영향을 주었으며, 높은 분류 복잡도로부터 나타나는 데이터 중복은 학습 성능을 방해하는 요인으로 확인되었다. PR 평가의 AUC는 높은 분류 복잡도와 높은 불균형 데이터 비율에서 ROC 평가의 AUC보다 학습 성능의 차이가 크게 나타났다. 그러나 낮은 분류 복잡도와 낮은 불균형 데이터 비율의 문제에서 두 측정 방법의 학습 성능의 차이는 미비하거나 비슷하였다. 이러한 결과로부터 PR 평가의 AUC는 클래스 불균형 문제의 학습 모델의 설계와 오분류 비용을 고려한 최적의 학습기를 결정하는데 도움을 줄 수 있다.