• Title/Summary/Keyword: Precision-recall

Search Result 711, Processing Time 0.032 seconds

A Multi-Stage Approach to Secure Digital Image Search over Public Cloud using Speeded-Up Robust Features (SURF) Algorithm

  • AL-Omari, Ahmad H.;Otair, Mohammed A.;Alzwahreh, Bayan N.
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.12
    • /
    • pp.65-74
    • /
    • 2021
  • Digital image processing and retrieving have increasingly become very popular on the Internet and getting more attention from various multimedia fields. That results in additional privacy requirements placed on efficient image matching techniques in various applications. Hence, several searching methods have been developed when confidential images are used in image matching between pairs of security agencies, most of these search methods either limited by its cost or precision. This study proposes a secure and efficient method that preserves image privacy and confidentially between two communicating parties. To retrieve an image, feature vector is extracted from the given query image, and then the similarities with the stored database images features vector are calculated to retrieve the matched images based on an indexing scheme and matching strategy. We used a secure content-based image retrieval features detector algorithm called Speeded-Up Robust Features (SURF) algorithm over public cloud to extract the features and the Honey Encryption algorithm. The purpose of using the encrypted images database is to provide an accurate searching through encrypted documents without needing decryption. Progress in this area helps protect the privacy of sensitive data stored on the cloud. The experimental results (conducted on a well-known image-set) show that the performance of the proposed methodology achieved a noticeable enhancement level in terms of precision, recall, F-Measure, and execution time.

A Computer-Aided Diagnosis of Brain Tumors Using a Fine-Tuned YOLO-based Model with Transfer Learning

  • Montalbo, Francis Jesmar P.
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.12
    • /
    • pp.4816-4834
    • /
    • 2020
  • This paper proposes transfer learning and fine-tuning techniques for a deep learning model to detect three distinct brain tumors from Magnetic Resonance Imaging (MRI) scans. In this work, the recent YOLOv4 model trained using a collection of 3064 T1-weighted Contrast-Enhanced (CE)-MRI scans that were pre-processed and labeled for the task. This work trained with the partial 29-layer YOLOv4-Tiny and fine-tuned to work optimally and run efficiently in most platforms with reliable performance. With the help of transfer learning, the model had initial leverage to train faster with pre-trained weights from the COCO dataset, generating a robust set of features required for brain tumor detection. The results yielded the highest mean average precision of 93.14%, a 90.34% precision, 88.58% recall, and 89.45% F1-Score outperforming other previous versions of the YOLO detection models and other studies that used bounding box detections for the same task like Faster R-CNN. As concluded, the YOLOv4-Tiny can work efficiently to detect brain tumors automatically at a rapid phase with the help of proper fine-tuning and transfer learning. This work contributes mainly to assist medical experts in the diagnostic process of brain tumors.

Design and Implementation of the Content-Based Image Retrieval System using Color Features on the World Wide Web (WWW에서 칼라특징을 이용한 내용기반 화상검색 시스템의 설계 및 구현)

  • Choi, Hyun-Sub;Choi, Ki-Ho
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.9
    • /
    • pp.2315-2332
    • /
    • 1997
  • In this paper, we implement a content based image retrieval system for image searching by visual features from the image databases on WWW (world wide web). The image retrieval system finds the images that contain the most similar color regions after the system automatically extracts color features from the input image. We can select one of two query methods which use a full image of $4{\times}4$ 16 sketched color region. The image similarity is calculated on the histogram intersection distance and the histogram Euclidean distance. As the experimental results show that the two different query types provide the precision/recall 0.84/0.92 and 0.85/0.93 respectively, this retrieval system has been able to obtain high performance and validity.

  • PDF

Big Data Analysis and Prediction of Traffic in Los Angeles

  • Dauletbak, Dalyapraz;Woo, Jongwook
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.2
    • /
    • pp.841-854
    • /
    • 2020
  • The paper explains the method to process, analyze and predict traffic patterns in Los Angeles county using Big Data and Machine Learning. The dataset is used from a popular navigating platform in the USA, which tracks information on the road using connected users' devices and also collects reports shared by the users through the app. The dataset mainly consists of information about traffic jams and traffic incidents reported by users, such as road closure, hazards, accidents. The major contribution of this paper is to give a clear view of how the large-scale road traffic data can be stored and processed using the Big Data system - Hadoop and its ecosystem (Hive). In addition, analysis is explained with the help of visuals using Business Intelligence and prediction with classification machine learning model on the sampled traffic data is presented using Azure ML. The process of modeling, as well as results, are interpreted using metrics: accuracy, precision and recall.

Definition Sentences Recognition Based on Definition Centroid

  • Kim, Kweon-Yang
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.17 no.6
    • /
    • pp.813-818
    • /
    • 2007
  • This paper is concerned with the problem of recognizing definition sentences. Given a definition question like "Who is the person X?", we are to retrieve the definition sentences which capture descriptive information correspond variously to a person's age, occupation, of some role a person played in an event from the collection of news articles. In order to retrieve as many relevant sentences for the definition question as possible, we adopt a centroid based statistical approach which has been applied in summarization of multiple documents. To improve the precision and recall performance, the weight measure of centroid words is supplemented by using external knowledge resource such as Wikipedia and redundant candidate sentences are removed from candidate definitions. We see some improvements obtained by our approach over the baseline for 20 IT persons who have high document frequency.

Design of a Large Real-Time Personalized Recommendation System (대용량 개인화 실시간 상품 추천 시스템 설계)

  • Kim Jong-Hee;Shim Jang-Sup;Lee Dong-Ha;Jung Soon-Key
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2006.05a
    • /
    • pp.109-112
    • /
    • 2006
  • 최근 대용량 추천시스템에 대한 필요성이 증가하고 있고, 특히 대규모 인터넷 쇼핑몰을 위한 개인화 추천 시스템 구조에 대한 관심이 높아지고 있다. 본 논문에서는 k-means 클러스터링과 순차 패턴 기법을 이용한 인터넷 쇼핑몰 상품 추천 시스템을 설계 및 구현한다. 사용자 정보의 일괄처리와 카테고리의 계층적 특성을 반영하면서 데이터 마이닝 기법을 활용하여 개인화된 추천 엔진을 대형 시스템에서 동작하도록 설계 하였다. 설계 구현한 시스템의 평가를 위해, 대형 쇼핑몰의 데이터를 이용하여 추천 예측 정확율(PRP: Predictive Recommend Precision), 추천 예측 재현율(PRR: Predictive Recommend Recall), 정확도 인수(PF1 : Predictive Factor One-measure)를 구하였다.

  • PDF

Protein Named Entity Identification Based on Probabilistic Features Derived from GENIA Corpus and Medical Text on the Web

  • Sumathipala, Sagara;Yamada, Koichi;Unehara, Muneyuki;Suzuki, Izumi
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.15 no.2
    • /
    • pp.111-120
    • /
    • 2015
  • Protein named entity identification is one of the most essential and fundamental predecessor for extracting information about protein-protein interactions from biomedical literature. In this paper, we explore the use of abstracts of biomedical literature in MEDLINE for protein name identification and present the results of the conducted experiments. We present a robust and effective approach to classify biomedical named entities into protein and non-protein classes, based on a rich set of features: orthographic, keyword, morphological and newly introduced Protein-Score features. Our procedure shows significant performance in the experiments on GENIA corpus using Random Forest, achieving the highest values of precision 92.7%, recall 91.7%, and F-measure 92.2% for protein identification, while reducing the training and testing time significantly.

Compositional rules of Korean auxiliary predicates for sentiment analysis

  • Lee, Kong Joo
    • Journal of Advanced Marine Engineering and Technology
    • /
    • v.37 no.3
    • /
    • pp.291-299
    • /
    • 2013
  • Most sentiment analysis systems count the number of occurrences of sentiment expressions in a text, and evaluate the text by summing polarity values of extracted sentiment expressions. However, linguistic contexts of the expressions should be taken into account in order to analyze sentimental orientation of the text meticulously. Korean auxiliary predicates affect meaning of the main verb or adjective in some ways while attached to it in their usage. In this paper, we introduce a new approach that handles Korean auxiliary predicates in the light of sentiment analysis. We classify the auxiliary predicates according to their strength of impact on sentiment polarity values. We also define compositional rules of auxiliary predicates to update polarity values when the predicates appear along with sentiment expressions. This approach is implemented to a sentiment analysis system to extract opinions about a specific individual from review documents which were collected from various web sites. An experimental result shows approximately 72.6% precision and 52.7% recall for correctly detecting sentiment expressions from a text.

Semantic Matching Engine for Searching Web Services (웹 서비스 검색을 위한 시맨틱 매칭 엔진)

  • Yang, Seung-Hoon;Lee, Dae-Wook;Kwon, Joon-Ho;Lee, Suk-Ho
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2006.10c
    • /
    • pp.267-272
    • /
    • 2006
  • 인터넷망의 지속적인 발달과 함께 웹 애플리케이션 개발 방법으로 XML 기반의 웹 서비스가 부각되면서 많은 웹 서비스들이 개발되었고, 점차 더 많은 웹 서비스들이 개발될 것으로 예상된다. 이처럼 급격하게 늘어나는 웹 서비스들 중에서 사용자가 원하는 웹 서비스 찾는 것이 중요한 이슈로 부각되고 있다. 그러나 현재의 웹 서비스 검색 표준인 UDDI 레지스트리는 키워드 기반이기 때문에 검색 성능의 한계점을 갖고 있다. 최근에 이러한 한계를 극복하고자 하는 많은 연구가 진행되고 있지만 아직은 많이 부족한 상황이다. 따라서 본 논문에서는 비록 키워드가 일치하지 않더라도 사용자가 원하는 웹 서비스를 찾을 수 있도록 웹 서비스 표준인 UDDI 레지스트리에 시맨틱 매칭 엔진(semantic matching engine)이라는 추가적인 시맨틱 레이어를 추가하여 재현율(recall)과 정확률(precision)을 모두 향상 시킬 수 있는 시스템을 제안한다.

  • PDF

Comparing Korean Spam Document Classification Using Document Classification Algorithms (문서 분류 알고리즘을 이용한 한국어 스팸 문서 분류 성능 비교)

  • Song, Chull-Hwan;Yoo, Seong-Joon
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2006.10c
    • /
    • pp.222-225
    • /
    • 2006
  • 한국은 다른 나라에 비해 많은 인터넷 사용자를 가지고 있다. 이에 비례해서 한국의 인터넷 유저들은 Spam Mail에 대해 많은 불편함을 호소하고 있다. 이러한 문제를 해결하기 위해 본 논문은 다양한 Feature Weighting, Feature Selection 그리고 문서 분류 알고리즘들을 이용한 한국어 스팸 문서 Filtering연구에 대해 기술한다. 그리고 한국어 문서(Spam/Non-Spam 문서)로부터 영사를 추출하고 이를 각 분류 알고리즘의 Input Feature로써 이용한다. 그리고 우리는 Feature weighting 에 대해 기존의 전통적인 방법이 아니라 각 Feature에 대해 Variance 값을 구하고 Global Feature를 선택하기 위해 Max Value Selection 방법에 적용 후에 전통적인 Feature Selection 방법인 MI, IG, CHI 들을 적용하여 Feature들을 추출한다. 이렇게 추출된 Feature들을 Naive Bayes, Support Vector Machine과 같은 분류 알고리즘에 적용한다. Vector Space Model의 경우에는 전통적인 방법 그대로 사용한다. 그 결과 우리는 Support Vector Machine Classifier, TF-IDF Variance Weighting(Combined Max Value Selection), CHI Feature Selection 방법을 사용할 경우 Recall(99.4%), Precision(97.4%), F-Measure(98.39%)의 성능을 보였다.

  • PDF