• Title/Summary/Keyword: Precision-recall

Search Result 731, Processing Time 0.024 seconds

Comparison and Evaluation of Web-based Image Search Engines (이미지정보 탐색을 위한 웹 검색엔진의 비교 평가)

  • Kim, Hyo-Jung
    • Journal of Information Management
    • /
    • v.31 no.4
    • /
    • pp.50-70
    • /
    • 2000
  • Since the contents of internet resources are beginning to include texts, images and sounds, different Web-based image search engines have been developed accordingly. It is a fact that these diversities of multimedia contents have made search process and retrieval of relevant information very difficult. The purpose of the study is to compare and evaluate its special features and performance of the existing image search engines in order to provide user help to select appropriate search engines. The study selected AV Photo Finder, Lycos MultiMedia, Amazing Picture Machine, Image Surfer, WebSeek, Ditto for comparison and evaluation because of their reputations of popularity among users of image search engines. The methodology of the study was to analyze previous related literature and establish criteria for the evaluation of image search engines. The study investigated characteristics, indexing methods, search capabilities, screen display and user interfaces of different search engines for the purpose of comparison of its performance. Finally, the study measured relative recall and precision ratios to evaluate their electiveness of retrieval under the experimental set up. Results of the comparative analysis in regard to its search performance are as follows. AV Photo Finder marked the highest rank among other image search engines. Ditto and WebSeek also showed comparatively high precision ratio. Lycos MultiMedia and Image Surfer follows after them. Amazing Picture Machine stowed the lowest in ranking.

  • PDF

Machine-Learning Based Biomedical Term Recognition (기계학습에 기반한 생의학분야 전문용어의 자동인식)

  • Oh Jong-Hoon;Choi Key-Sun
    • Journal of KIISE:Software and Applications
    • /
    • v.33 no.8
    • /
    • pp.718-729
    • /
    • 2006
  • There has been increasing interest in automatic term recognition (ATR), which recognizes technical terms for given domain specific texts. ATR is composed of 'term extraction', which extracts candidates of technical terms and 'term selection' which decides whether terms in a term list derived from 'term extraction' are technical terms or not. 'term selection' is a process to rank a term list depending on features of technical term and to find the boundary between technical term and general term. The previous works just use statistical features of terms for 'term selection'. However, there are limitations on effectively selecting technical terms among a term list using the statistical feature. The objective of this paper is to find effective features for 'term selection' by considering various aspects of technical terms. In order to solve the ranking problem, we derive various features of technical terms and combine the features using machine-learning algorithms. For solving the boundary finding problem, we define it as a binary classification problem which classifies a term in a term list into technical term and general term. Experiments show that our method records 78-86% precision and 87%-90% recall in boundary finding, and 89%-92% 11-point precision in ranking. Moreover, our method shows higher performance than the previous work's about 26% in maximum.

Automatic Generation of Information Extraction Rules Through User-interface Agents (사용자 인터페이스 에이전트를 통한 정보추출 규칙의 자동 생성)

  • 김용기;양재영;최중민
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.4
    • /
    • pp.447-456
    • /
    • 2004
  • Information extraction is a process of recognizing and fetching particular information fragments from a document. In order to extract information uniformly from many heterogeneous information sources, it is necessary to produce information extraction rules called a wrapper for each source. Previous methods of information extraction can be categorized into manual wrapper generation and automatic wrapper generation. In the manual method, since the wrapper is manually generated by a human expert who analyzes documents and writes rules, the precision of the wrapper is very high whereas it reveals problems in scalability and efficiency In the automatic method, the agent program analyzes a set of example documents and produces a wrapper through learning. Although it is very scalable, this method has difficulty in generating correct rules per se, and also the generated rules are sometimes unreliable. This paper tries to combine both manual and automatic methods by proposing a new method of learning information extraction rules. We adopt the scheme of supervised learning in which a user-interface agent is designed to get information from the user regarding what to extract from a document, and eventually XML-based information extraction rules are generated through learning according to these inputs. The interface agent is used not only to generate new extraction rules but also to modify and extend existing ones to enhance the precision and the recall measures of the extraction system. We have done a series of experiments to test the system, and the results are very promising. We hope that our system can be applied to practical systems such as information-mediator agents.

A Study on the Development of Emotional Content through Natural Language Processing Deep Learning Model Emotion Analysis (자연어 처리 딥러닝 모델 감정분석을 통한 감성 콘텐츠 개발 연구)

  • Hyun-Soo Lee;Min-Ha Kim;Ji-won Seo;Jung-Yi Kim
    • The Journal of the Convergence on Culture Technology
    • /
    • v.9 no.4
    • /
    • pp.687-692
    • /
    • 2023
  • We analyze the accuracy of emotion analysis of natural language processing deep learning model and propose to use it for emotional content development. After looking at the outline of the GPT-3 model, about 6,000 pieces of dialogue data provided by Aihub were input to 9 emotion categories: 'joy', 'sadness', 'fear', 'anger', 'disgust', and 'surprise'. ', 'interest', 'boredom', and 'pain'. Performance evaluation was conducted using the evaluation indices of accuracy, precision, recall, and F1-score, which are evaluation methods for natural language processing models. As a result of the emotion analysis, the accuracy was over 91%, and in the case of precision, 'fear' and 'pain' showed low values. In the case of reproducibility, a low value was shown in negative emotions, and in the case of 'disgust' in particular, an error appeared due to the lack of data. In the case of previous studies, emotion analysis was mainly used only for polarity analysis divided into positive, negative, and neutral, and there was a limitation in that it was used only in the feedback stage due to its nature. We expand emotion analysis into 9 categories and suggest its use in the development of emotional content considering it from the planning stage. It is expected that more accurate results can be obtained if emotion analysis is performed by additionally collecting more diverse daily conversations through follow-up research.

Automatic Target Recognition Study using Knowledge Graph and Deep Learning Models for Text and Image data (지식 그래프와 딥러닝 모델 기반 텍스트와 이미지 데이터를 활용한 자동 표적 인식 방법 연구)

  • Kim, Jongmo;Lee, Jeongbin;Jeon, Hocheol;Sohn, Mye
    • Journal of Internet Computing and Services
    • /
    • v.23 no.5
    • /
    • pp.145-154
    • /
    • 2022
  • Automatic Target Recognition (ATR) technology is emerging as a core technology of Future Combat Systems (FCS). Conventional ATR is performed based on IMINT (image information) collected from the SAR sensor, and various image-based deep learning models are used. However, with the development of IT and sensing technology, even though data/information related to ATR is expanding to HUMINT (human information) and SIGINT (signal information), ATR still contains image oriented IMINT data only is being used. In complex and diversified battlefield situations, it is difficult to guarantee high-level ATR accuracy and generalization performance with image data alone. Therefore, we propose a knowledge graph-based ATR method that can utilize image and text data simultaneously in this paper. The main idea of the knowledge graph and deep model-based ATR method is to convert the ATR image and text into graphs according to the characteristics of each data, align it to the knowledge graph, and connect the heterogeneous ATR data through the knowledge graph. In order to convert the ATR image into a graph, an object-tag graph consisting of object tags as nodes is generated from the image by using the pre-trained image object recognition model and the vocabulary of the knowledge graph. On the other hand, the ATR text uses the pre-trained language model, TF-IDF, co-occurrence word graph, and the vocabulary of knowledge graph to generate a word graph composed of nodes with key vocabulary for the ATR. The generated two types of graphs are connected to the knowledge graph using the entity alignment model for improvement of the ATR performance from images and texts. To prove the superiority of the proposed method, 227 documents from web documents and 61,714 RDF triples from dbpedia were collected, and comparison experiments were performed on precision, recall, and f1-score in a perspective of the entity alignment..

A Study of Anomaly Detection for ICT Infrastructure using Conditional Multimodal Autoencoder (ICT 인프라 이상탐지를 위한 조건부 멀티모달 오토인코더에 관한 연구)

  • Shin, Byungjin;Lee, Jonghoon;Han, Sangjin;Park, Choong-Shik
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.57-73
    • /
    • 2021
  • Maintenance and prevention of failure through anomaly detection of ICT infrastructure is becoming important. System monitoring data is multidimensional time series data. When we deal with multidimensional time series data, we have difficulty in considering both characteristics of multidimensional data and characteristics of time series data. When dealing with multidimensional data, correlation between variables should be considered. Existing methods such as probability and linear base, distance base, etc. are degraded due to limitations called the curse of dimensions. In addition, time series data is preprocessed by applying sliding window technique and time series decomposition for self-correlation analysis. These techniques are the cause of increasing the dimension of data, so it is necessary to supplement them. The anomaly detection field is an old research field, and statistical methods and regression analysis were used in the early days. Currently, there are active studies to apply machine learning and artificial neural network technology to this field. Statistically based methods are difficult to apply when data is non-homogeneous, and do not detect local outliers well. The regression analysis method compares the predictive value and the actual value after learning the regression formula based on the parametric statistics and it detects abnormality. Anomaly detection using regression analysis has the disadvantage that the performance is lowered when the model is not solid and the noise or outliers of the data are included. There is a restriction that learning data with noise or outliers should be used. The autoencoder using artificial neural networks is learned to output as similar as possible to input data. It has many advantages compared to existing probability and linear model, cluster analysis, and map learning. It can be applied to data that does not satisfy probability distribution or linear assumption. In addition, it is possible to learn non-mapping without label data for teaching. However, there is a limitation of local outlier identification of multidimensional data in anomaly detection, and there is a problem that the dimension of data is greatly increased due to the characteristics of time series data. In this study, we propose a CMAE (Conditional Multimodal Autoencoder) that enhances the performance of anomaly detection by considering local outliers and time series characteristics. First, we applied Multimodal Autoencoder (MAE) to improve the limitations of local outlier identification of multidimensional data. Multimodals are commonly used to learn different types of inputs, such as voice and image. The different modal shares the bottleneck effect of Autoencoder and it learns correlation. In addition, CAE (Conditional Autoencoder) was used to learn the characteristics of time series data effectively without increasing the dimension of data. In general, conditional input mainly uses category variables, but in this study, time was used as a condition to learn periodicity. The CMAE model proposed in this paper was verified by comparing with the Unimodal Autoencoder (UAE) and Multi-modal Autoencoder (MAE). The restoration performance of Autoencoder for 41 variables was confirmed in the proposed model and the comparison model. The restoration performance is different by variables, and the restoration is normally well operated because the loss value is small for Memory, Disk, and Network modals in all three Autoencoder models. The process modal did not show a significant difference in all three models, and the CPU modal showed excellent performance in CMAE. ROC curve was prepared for the evaluation of anomaly detection performance in the proposed model and the comparison model, and AUC, accuracy, precision, recall, and F1-score were compared. In all indicators, the performance was shown in the order of CMAE, MAE, and AE. Especially, the reproduction rate was 0.9828 for CMAE, which can be confirmed to detect almost most of the abnormalities. The accuracy of the model was also improved and 87.12%, and the F1-score was 0.8883, which is considered to be suitable for anomaly detection. In practical aspect, the proposed model has an additional advantage in addition to performance improvement. The use of techniques such as time series decomposition and sliding windows has the disadvantage of managing unnecessary procedures; and their dimensional increase can cause a decrease in the computational speed in inference.The proposed model has characteristics that are easy to apply to practical tasks such as inference speed and model management.

A Hybrid Link Quality Assessment for IEEE802.15.4 based Large-scale Multi-hop Wireless Sensor Networks (IEEE802.15.4 기반 대규모 멀티 홉 무선센서네트워크를 위한 하이브리드 링크 품질 평가 방법)

  • Lee, Sang-Shin;Kim, Joong-Hwan;Kim, Sang-Cheol
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.48 no.4
    • /
    • pp.35-42
    • /
    • 2011
  • Link quality assessment is a crucial part of sensor network formation to stably operate large-scale wireless sensor networks (WSNs). A stability of path consisting of several nodes strongly depends on all link quality between pair of consecutive nodes. Thus it is very important to assess the link quality on the stage of building a routing path. In this paper, we present a link quality assessment method, Hybrid Link Quality Metric (HQLM), which uses both of LQI and RSSI from RF chip of sensor nodes to minimize set-up time and energy consumption for network formation. The HQLM not only reduces the time and energy consumption, but also provides complementary cooperation of LQI and RSSI. In order to evaluate the validity and efficiency of the proposed method, we measure PDR (Packet Delivery Rate) by exchanging multiple messages and then, compare PDR to the result of HQLM for evaluation. From the research being carried out, we can conclude that the HQLM performs better than either LQI- or RSSI-based metric in terms of recall, precision, and matching on link quality.

On-Road Car Detection System Using VD-GMM 2.0 (차량검출 GMM 2.0을 적용한 도로 위의 차량 검출 시스템 구축)

  • Lee, Okmin;Won, Insu;Lee, Sangmin;Kwon, Jangwoo
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.40 no.11
    • /
    • pp.2291-2297
    • /
    • 2015
  • This paper presents a vehicle detection system using the video as a input image what has moving of vehicles.. Input image has constraints. it has to get fixed view and downward view obliquely from top of the road. Road detection is required to use only the road area in the input image. In introduction, we suggest the experiment result and the critical point of motion history image extraction method, SIFT(Scale_Invariant Feature Transform) algorithm and histogram analysis to detect vehicles. To solve these problem, we propose using applied Gaussian Mixture Model(GMM) that is the Vehicle Detection GMM(VDGMM). In addition, we optimize VDGMM to detect vehicles more and named VDGMM 2.0. In result of experiment, each precision, recall and F1 rate is 9%, 53%, 15% for GMM without road detection and 85%, 77%, 80% for VDGMM2.0 with road detection.

Network Anomaly Detection Technologies Using Unsupervised Learning AutoEncoders (비지도학습 오토 엔코더를 활용한 네트워크 이상 검출 기술)

  • Kang, Koohong
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.30 no.4
    • /
    • pp.617-629
    • /
    • 2020
  • In order to overcome the limitations of the rule-based intrusion detection system due to changes in Internet computing environments, the emergence of new services, and creativity of attackers, network anomaly detection (NAD) using machine learning and deep learning technologies has received much attention. Most of these existing machine learning and deep learning technologies for NAD use supervised learning methods to learn a set of training data set labeled 'normal' and 'attack'. This paper presents the feasibility of the unsupervised learning AutoEncoder(AE) to NAD from data sets collecting of secured network traffic without labeled responses. To verify the performance of the proposed AE mode, we present the experimental results in terms of accuracy, precision, recall, f1-score, and ROC AUC value on the NSL-KDD training and test data sets. In particular, we model a reference AE through the deep analysis of diverse AEs varying hyper-parameters such as the number of layers as well as considering the regularization and denoising effects. The reference model shows the f1-scores 90.4% and 89% of binary classification on the KDDTest+ and KDDTest-21 test data sets based on the threshold of the 82-th percentile of the AE reconstruction error of the training data set.

Spam-Mail Filtering System Using Weighted Bayesian Classifier (가중치가 부여된 베이지안 분류자를 이용한 스팸 메일 필터링 시스템)

  • 김현준;정재은;조근식
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.8
    • /
    • pp.1092-1100
    • /
    • 2004
  • An E-mails have regarded as one of the most popular methods for exchanging information because of easy usage and low cost. Meanwhile, exponentially growing unwanted mails in user's mailbox have been raised as main problem. Recognizing this issue, Korean government established a law in order to prevent e-mail abuse. In this paper we suggest hybrid spam mail filtering system using weighted Bayesian classifier which is extended from naive Bayesian classifier by adding the concept of preprocessing and intelligent agents. This system can classify spam mails automatically by using training data without manual definition of message rules. Particularly, we improved filtering efficiency by imposing weight on some character by feature extraction from spam mails. Finally, we show efficiency comparison among four cases - naive Bayesian, weighting on e-mail header, weighting on HTML tags, weighting on hyperlinks and combining all of four cases. As compared with naive Bayesian classifier, the proposed system obtained 5.7% decreased precision, while the recall and F-measure of this system increased by 33.3% and 31.2%, respectively.