• Title/Summary/Keyword: Precision-recall

Search Result 702, Processing Time 0.031 seconds

Competitor Extraction based on Machine Learning Methods (기계학습 기반 경쟁자 자동추출 방법)

  • Lee, Chung-Hee;Kim, Hyun-Jin;Ryu, Pum-Mo;Kim, Hyun-Ki;Seo, Young-Hoon
    • Annual Conference on Human and Language Technology
    • /
    • 2012.10a
    • /
    • pp.107-112
    • /
    • 2012
  • 본 논문은 일반 텍스트에 나타나는 경쟁 관계에 있는 고유명사들을 경쟁자로 자동 추출하는 방법에 대한 것으로, 규칙 기반 방법과 기계 학습 기반 방법을 모두 제안하고 비교하였다. 제안한 시스템은 뉴스 기사를 대상으로 하였고, 문장에 경쟁관계를 나타내는 명확한 정보가 있는 경우에만 추출하는 것을 목표로 하였다. 규칙기반 경쟁어 추출 시스템은 2개의 고유명사가 경쟁관계임을 나타내는 단서단어에 기반해서 경쟁어를 추출하는 시스템이며, 경쟁표현 단서단어는 620개가 수집되어 사용됐다. 기계학습 기반 경쟁어 추출시스템은 경쟁어 추출을 경쟁어 후보에 대한 경쟁여부의 바이너리 분류 문제로 접근하였다. 분류 알고리즘은 Support Vector Machines을 사용하였고, 경쟁어 주변 문맥 정보를 대표할 수 있는 언어 독립적 5개 자질에 기반해서 모델을 학습하였다. 성능평가를 위해서 이슈화되고 있는 핫키워드 54개에 대해서 623개의 경쟁어를 뉴스 기사로부터 수집해서 평가셋을 구축하였다. 비교 평가를 위해서 기준시스템으로 연관어에 기반해서 경쟁어를 추출하는 시스템을 구현하였고, Recall/Precision/F1 성능으로 0.119/0.214/0.153을 얻었다. 제안 시스템의 실험 결과로 규칙기반 시스템은 0.793/0.207/0.328 성능을 보였고, 기계 학습기반 시스템은 0.578/0.730/0.645 성능을 보였다. Recall 성능은 규칙기반 시스템이 0.793으로 가장 좋았고, 기준시스템에 비해서 67.4%의 성능 향상이 있었다. Precision과 F1 성능은 기계학습기반 시스템이 0.730과 0.645로 가장 좋았고, 기준시스템에 비해서 각각 61.6%, 49.2%의 성능향상이 있었다. 기준시스템에 비해서 제안한 시스템이 Recall, Precision, F1 성능이 모두 대폭적으로 향상되었으므로 제안한 방법이 효과적임을 알 수 있다.

  • PDF

An Experimental Study Investigating the Retrieval Effectiveness of a Video Retrieval System Using Tag Query Expansion (태그 질의 확장 기능에 기반한 비디오 검색 시스템의 효율성에 대한 실험적 연구)

  • Kim, Hyun-Hee
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.44 no.4
    • /
    • pp.75-94
    • /
    • 2010
  • This study designed a pilot system in which queries can be expanded through a tag ontology where equivalent, synonymous, or related tags are bound together, in order to improve the retrieval effectiveness of videos. We evaluated the proposed pilot system by comparing it to a tag-based system without tag control, in terms of recall and precision rates. Our study results showed that the mean recall rate in the structured folksonomy-based system was statistically higher than that in the tag-based system. On the other hand, the mean precision rate in the structured folksonomy-based system was not statistically higher than that in the tag-based system. The result of this study can be utilized as a guide on how to effectively use tags as social metadata of digital video libraries.

Clustering-Based Recommendation Using Users' Preference (사용자 선호도를 사용한 군집 기반 추천 시스템)

  • Kim, Younghyun;Shin, Won-Yong
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.21 no.2
    • /
    • pp.277-284
    • /
    • 2017
  • In a flood of information, most users will want to get a proper recommendation. If a recommender system fails to give appropriate contents, then quality of experience (QoE) will be drastically decreased. In this paper, we propose a recommender system based on the intra-cluster users' item preference for improving recommendation accuracy indices such as precision, recall, and F1 score. To this end, first, users are divided into several clusters based on the actual rating data and Pearson correlation coefficient (PCC). Afterwards, we give each item an advantage/disadvantage according to the preference tendency by users within the same cluster. Specifically, an item will be received an advantage/disadvantage when the item which has been averagely rated by other users within the same cluster is above/below a predefined threshold. The proposed algorithm shows a statistically significant performance improvement over the item-based collaborative filtering algorithm with no clustering in terms of recommendation accuracy indices such as precision, recall, and F1 score.

A Hybrid Information Retrieval Model Using Metadata and Text (메타데이타와 텍스트 정보의 통합검색 모델)

  • Yoo, Jeong-Mok;Myaeng, Sung-Hyon;Kim, Sung-Soo;Lee, Mann-Ho
    • Journal of KIISE:Databases
    • /
    • v.34 no.3
    • /
    • pp.232-243
    • /
    • 2007
  • Metadata IR model has high precision and low recall because the query in Metadata IR model is strict that is, the query can express user information need exactly, while Full-text IR model has low precision and high recall because the query in Full-text IR model is a kind of simple keyword query which expresses user information need roughly. If user can translate one's information need into structured query well, the retrieval result will be improved. However, it is little possible to make relevant query without understanding characteristics of metadata. Unfortunately, most users do not interested in metadata, then they cannot construct well-made structured query. Amount of information contained in metadata is less than text information. In this paper, we suggest hybrid IR model using metadata and text which can provide users with lots of relevant documents by retrieving from metadata field and text field complementarily.

Implementation of CNN Model for Classification of Sitting Posture Based on Multiple Pressure Distribution (다중 압력분포 기반의 착석 자세 분류를 위한 CNN 모델 구현)

  • Seo, Ji-Yun;Noh, Yun-Hong;Jeong, Do-Un
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.21 no.2
    • /
    • pp.73-78
    • /
    • 2020
  • Musculoskeletal disease is often caused by sitting down for long period's time or by bad posture habits. In order to prevent musculoskeletal disease in daily life, it is the most important to correct the bad sitting posture to the right one through real-time monitoring. In this study, to detect the sitting information of user's without any constraints, we propose posture measurement system based on multi-channel pressure sensor and CNN model for classifying sitting posture types. The proposed CNN model can analyze 5 types of sitting postures based on sitting posture information. For the performance assessment of posture classification CNN model through field test, the accuracy, recall, precision, and F1 of the classification results were checked with 10 subjects. As the experiment results, 99.84% of accuracy, 99.6% of recall, 99.6% of precision, and 99.6% of F1 were verified.

Arrhythmia Classification using Hybrid Combination Model of CNN-LSTM (합성곱-장단기 기억 신경망의 하이브리드 결합 모델을 이용한 부정맥 분류)

  • Cho, Ik-Sung;Kwon, Hyeog-Soong
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.1
    • /
    • pp.76-84
    • /
    • 2022
  • Arrhythmia is a condition in which the heart beats abnormally or irregularly, early detection is very important because it can cause dangerous situations such as fainting or sudden cardiac death. However, performance degradation occurs due to personalized differences in ECG signals. In this paper, we propose arrhythmia classification using hybrid combination model of CNN-LSTM. For this purpose, the R wave is detected from noise removed signal and a single bit segment was extracted. It consisted of eight convolutional layers to extract the features of the arrhythmia in detail, used them as the input of the LSTM. The weights were learned through deep learning and the model was evaluated by the verification data. The performance was compared in terms of the accuracy, precision, recall, F1 score through MIT-BIH arrhythmia database. The achieved scores indicate 92.3%, 90.98%, 92.20%, 90.72% in terms of the accuracy, precision, recall, F1 score, respectively.

Analyzing the correlation of Spam Recall and Thesaurus

  • Kang, Sin-Jae;Kim, Jong-Wan
    • Proceedings of the Korea Society of Information Technology Applications Conference
    • /
    • 2005.11a
    • /
    • pp.21-25
    • /
    • 2005
  • In this paper, we constructed a two-phase spam-mail filtering system based on the lexical and conceptual information. There are two kinds of information that can distinguish the spam mail from the legitimate mail. The definite information is the mail sender's information, URL, a certain spam list, and the less definite information is the word list and concept codes extracted from the mail body. We first classified the spam mail by using the definite information, and then used the less definite information. We used the lexical information and concept codes contained in the email body for SVM learning in the $2^{nd}$ phase. According to our results the spam precision was increased if more lexical information was used as features, and the spam recall was increased when the concept codes were included in features as well.

  • PDF

An experiment to enhance subject access in korean online public access catalog (온라인 열람목록의 주제탐색 강화를 위한 실험적 연구)

  • 장혜란;홍지윤
    • Journal of Korean Library and Information Science Society
    • /
    • v.25
    • /
    • pp.83-107
    • /
    • 1996
  • The purpose of this study is to experiment online public access catalog enhancements to improve its subject access capability. Three catalog databases, enhanced with title keywords, controlled vocabulary, and content words with controlled vocabulary respectively, were implemented. 18 searchers performed 2 subject searshes against 3 different catalog databases. And the transaction logs are analyzed. The results of the study can be summarized as follows : Controlled vocabulary catalog database achieved 41.8% recall ratio in average ; the addition of table of contents words to the controlled vocabulary is an effective technique with increasing recall ration upto 55% without decreasing precision ; and the database enhanced with title keywords shows 31.7% recall ratio in average. Of the three kinds of catalog databases, only the catalog with contents words produced 2 unique relevant documents. The results indicate that both user training and system development is required to have better search performance in online public access catalog.

  • PDF

Hybrid Approach to Sentiment Analysis based on Syntactic Analysis and Machine Learning (구문분석과 기계학습 기반 하이브리드 텍스트 논조 자동분석)

  • Hong, Mun-Pyo;Shin, Mi-Young;Park, Shin-Hye;Lee, Hyung-Min
    • Language and Information
    • /
    • v.14 no.2
    • /
    • pp.159-181
    • /
    • 2010
  • This paper presents a hybrid approach to the sentiment analysis of online texts. The sentiment of a text refers to the feelings that the author of a text has towards a certain topic. Many existing approaches employ either a pattern-based approach or a machine learning based approach. The former shows relatively high precision in classifying the sentiments, but suffers from the data sparseness problem, i.e. the lack of patterns. The latter approach shows relatively lower precision, but 100% recall. The approach presented in the current work adopts the merits of both approaches. It combines the pattern-based approach with the machine learning based approach, so that the relatively high precision and high recall can be maintained. Our experiment shows that the hybrid approach improves the F-measure score for more than 50% in comparison with the pattern-based approach and for around 1% comparing with the machine learning based approach. The numerical improvement from the machine learning based approach might not seem to be quite encouraging, but the fact that in the current approach not only the sentiment or the polarity information of sentences but also the additional information such as target of sentiments can be classified makes the current approach promising.

  • PDF

Research on Chinese Microblog Sentiment Classification Based on TextCNN-BiLSTM Model

  • Haiqin Tang;Ruirui Zhang
    • Journal of Information Processing Systems
    • /
    • v.19 no.6
    • /
    • pp.842-857
    • /
    • 2023
  • Currently, most sentiment classification models on microblogging platforms analyze sentence parts of speech and emoticons without comprehending users' emotional inclinations and grasping moral nuances. This study proposes a hybrid sentiment analysis model. Given the distinct nature of microblog comments, the model employs a combined stop-word list and word2vec for word vectorization. To mitigate local information loss, the TextCNN model, devoid of pooling layers, is employed for local feature extraction, while BiLSTM is utilized for contextual feature extraction in deep learning. Subsequently, microblog comment sentiments are categorized using a classification layer. Given the binary classification task at the output layer and the numerous hidden layers within BiLSTM, the Tanh activation function is adopted in this model. Experimental findings demonstrate that the enhanced TextCNN-BiLSTM model attains a precision of 94.75%. This represents a 1.21%, 1.25%, and 1.25% enhancement in precision, recall, and F1 values, respectively, in comparison to the individual deep learning models TextCNN. Furthermore, it outperforms BiLSTM by 0.78%, 0.9%, and 0.9% in precision, recall, and F1 values.