• Title/Summary/Keyword: Precision-recall

Search Result 731, Processing Time 0.021 seconds

Audio Segmentation and Classification Using Support Vector Machine and Fuzzy C-Means Clustering Techniques (서포트 벡터 머신과 퍼지 클러스터링 기법을 이용한 오디오 분할 및 분류)

  • Nguyen, Ngoc;Kang, Myeong-Su;Kim, Cheol-Hong;Kim, Jong-Myon
    • The KIPS Transactions:PartB
    • /
    • v.19B no.1
    • /
    • pp.19-26
    • /
    • 2012
  • The rapid increase of information imposes new demands of content management. The purpose of automatic audio segmentation and classification is to meet the rising need for efficient content management. With this reason, this paper proposes a high-accuracy algorithm that segments audio signals and classifies them into different classes such as speech, music, silence, and environment sounds. The proposed algorithm utilizes support vector machine (SVM) to detect audio-cuts, which are boundaries between different kinds of sounds using the parameter sequence. We then extract feature vectors that are composed of statistical data and they are used as an input of fuzzy c-means (FCM) classifier to partition audio-segments into different classes. To evaluate segmentation and classification performance of the proposed SVM-FCM based algorithm, we consider precision and recall rates for segmentation and classification accuracy for classification. Furthermore, we compare the proposed algorithm with other methods including binary and FCM classifiers in terms of segmentation performance. Experimental results show that the proposed algorithm outperforms other methods in both precision and recall rates.

Design and Implementation of Web Directory Engine Using Dynamic Category Hierarchy (동적분류에 의한 주제별 웹 검색엔진의 설계 및 구현)

  • Choi Bum-Ghi;Park Sun;Park Tae-Su;Song Jae-Won;Lee Ju-Hong
    • Journal of Internet Computing and Services
    • /
    • v.7 no.2
    • /
    • pp.71-80
    • /
    • 2006
  • In web search engines, there are two main methods: directory searching and keyword searching. Keyword searching shows high recall rate but tends to come up with too many search results to find which users want to see the pages. Directory searching has also a difficulty to find the pages that users want in case of selecting improper category without knowing the exact category, that is, it shows high precision rates but low recall rates. We designed and implemented a new web search engine to resolve the problems of directory search method. It regards a category as a fuzzy set which contains keywords and calculate the degree of inclusion between categories. The merit of this method is to enhance the recall rate of directory searching by expanding subcategories on the basis of similarity.

  • PDF

Usefulness of RDF/OWL Format in Pediatric and Oncologic Nuclear Medicine Imaging Reports (소아 및 종양 핵의학 영상판독에서 RDF/OWL 데이터의 유용성)

  • Hwang, Kyung Hoon;Lee, Haejun;Koh, Geon;Choi, Duckjoo;Sun, Yong Han
    • Journal of Biomedical Engineering Research
    • /
    • v.36 no.4
    • /
    • pp.128-134
    • /
    • 2015
  • Recently, the structured data format in RDF/OWL has played an increasingly vital role in the semantic web. We converted pediatric and oncologic nuclear medicine imaging reports in free text into RDF/OWL format and evaluated the usefulness of nuclear medicine imaging reports in RDF/OWL by comparing SPARQL query results with the manually retrieved results by physicians from the reports in free text. SPARQL query showed 95% recall for simple queries and 91% recall for dedicated queries. In total, SPARQL query retrieved 93% (51 lesions of 55) recall and 100% precision for 20 clinical query items. All query results missed by SPARQL query were of some inference. Nuclear medicine imaging reports in the format of RDF/OWL were very useful for retrieving simple and dedicated query results using SPARQL query. Further study using more number of cases and knowledge for inference is warranted.

Image Retrieval Using Distance Histogram of Clustered Color Region (색상분할영역에서 거리히스토그램을 이용한 영상검색)

  • 장정동;이태홍
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.26 no.7B
    • /
    • pp.968-974
    • /
    • 2001
  • 최근 정보통신기술의 발전과 함께 영상매체의 급속한 증가로 영상의 효율적인 관리와 검색의 필요성이 요구되면서 내용기반 영상검색이 핵심기술로 대두되고 있다. 내용기반 영상검색에서 영상의 특징을 표현하기 위해 색상 히스토그램을 많이 사용하고 있으나, 색상만을 고려하는 것은 많은 단점을 지니고 있으므로 본 논문에서는 먼저 순차영역분할(sequential clustering)기법을 도입하여 영역을 분할하며, 분할된 영역의 색상평균값과 영역의 중심점으로부터의 거리 히스토그램을 영상의 특징으로 구하여 이를 비교함으로써 색상과 공간정보를 함께 고려하는 방법을 제안한다. 제안된 방법의 특성의 수가 18개로 타 방법보다 매우 작은 저장공간을 가지면서도 동시에 검색효율이 8.5% 이상 개선되었다. Precision 대 Recall에서도 각 질의영상에서 대부분의 Recall 값에서 제안한 방법의 우수함이 확인되었으며, 시각적으로도 양호한 검색결과를 얻을 수 있었다.

  • PDF

Content-based Image Retrieval Considering Color and Spatial Information (색상-공간정보를 고려한 내용기반 영상검색)

  • 장정동;이태홍
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.26 no.3B
    • /
    • pp.315-322
    • /
    • 2001
  • 최근 정보통신기술의 발전과 함께 영상매체의 급속한 증자로 영상의 효율적인 관리와 검색을 수행하기 위한 내용기반 영상검색은 핵심기술로 대두되고 있다. 내용기반 영상검색에서 영상의 특징을 표현하기 위해 색상 히스토그램을 많이 사용하고 있으나, 색상만을 고려하는 것은 많은 단점을 지니고 있으므로 본 논문에서는 영상의 특징으로 색상과 공간 정보를 함께 고려하기 위한 순차영역분할(sequential clustering) 기법을 도입하며, 분할된 영역의 색상평균값, 분산값과 영역의 크기를 특성벡터로 제안한다. 제안된 방법의 특성의수가 18개로 타 방법보다 매우 작은 저장공간을 가지면서도 검색효율이 8.8%이상 개선되었다. Precision 대 Recall에서도 각 질의 영상에서 대부분의 Recall 값에서 제안한 방법이 우수함이 확인되었으며, 시각적으로도 양호한 검색결과를 얻을 수 있었다.

  • PDF

Text filtering by Boosting Linear Perceptrons

  • O, Jang-Min;Zhang, Byoung-Tak
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.10 no.4
    • /
    • pp.374-378
    • /
    • 2000
  • in information retrieval, lack of positive examples is a main cause of poor performance. In this case most learning algorithms may not characteristics in the data to low recall. To solve the problem of unbalanced data, we propose a boosting method that uses linear perceptrons as weak learnrs. The perceptrons are trained on local data sets. The proposed algorithm is applied to text filtering problem for which only a small portion of positive examples is available. In the experiment on category crude of the Reuters-21578 document set, the boosting method achieved the recall of 80.8%, which is 37.2% improvement over multilayer with comparable precision.

  • PDF

Development of Deep Learning-Based Damage Detection Prototype for Concrete Bridge Condition Evaluation (콘크리트 교량 상태평가를 위한 딥러닝 기반 손상 탐지 프로토타입 개발)

  • Nam, Woo-Suk;Jung, Hyunjun;Park, Kyung-Han;Kim, Cheol-Min;Kim, Gyu-Seon
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.42 no.1
    • /
    • pp.107-116
    • /
    • 2022
  • Recently, research has been actively conducted on the technology of inspection facilities through image-based analysis assessment of human-inaccessible facilities. This research was conducted to study the conditions of deep learning-based imaging data on bridges and to develop an evaluation prototype program for bridges. To develop a deep learning-based bridge damage detection prototype, the Semantic Segmentation model, which enables damage detection and quantification among deep learning models, applied Mask-RCNN and constructed learning data 5,140 (including open-data) and labeling suitable for damage types. As a result of performance modeling verification, precision and reproduction rate analysis of concrete cracks, stripping/slapping, rebar exposure and paint stripping showed that the precision was 95.2 %, and the recall was 93.8 %. A 2nd performance verification was performed on onsite data of crack concrete using damage rate of bridge members.

An effective automated ontology construction based on the agriculture domain

  • Deepa, Rajendran;Vigneshwari, Srinivasan
    • ETRI Journal
    • /
    • v.44 no.4
    • /
    • pp.573-587
    • /
    • 2022
  • The agricultural sector is completely different from other sectors since it completely relies on various natural and climatic factors. Climate changes have many effects, including lack of annual rainfall and pests, heat waves, changes in sea level, and global ozone/atmospheric CO2 fluctuation, on land and agriculture in similar ways. Climate change also affects the environment. Based on these factors, farmers chose their crops to increase productivity in their fields. Many existing agricultural ontologies are either domain-specific or have been created with minimal vocabulary and no proper evaluation framework has been implemented. A new agricultural ontology focused on subdomains is designed to assist farmers using Jaccard relative extractor (JRE) and Naïve Bayes algorithm. The JRE is used to find the similarity between two sentences and words in the agricultural documents and the relationship between two terms is identified via the Naïve Bayes algorithm. In the proposed method, the preprocessing of data is carried out through natural language processing techniques and the tags whose dimensions are reduced are subjected to rule-based formal concept analysis and mapping. The subdomain ontologies of weather, pest, and soil are built separately, and the overall agricultural ontology are built around them. The gold standard for the lexical layer is used to evaluate the proposed technique, and its performance is analyzed by comparing it with different state-of-the-art systems. Precision, recall, F-measure, Matthews correlation coefficient, receiver operating characteristic curve area, and precision-recall curve area are the performance metrics used to analyze the performance. The proposed methodology gives a precision score of 94.40% when compared with the decision tree(83.94%) and K-nearest neighbor algorithm(86.89%) for agricultural ontology construction.

Evaluation of Multi-classification Model Performance for Algal Bloom Prediction Using CatBoost (머신러닝 CatBoost 다중 분류 알고리즘을 이용한 조류 발생 예측 모형 성능 평가 연구)

  • Juneoh Kim;Jungsu Park
    • Journal of Korean Society on Water Environment
    • /
    • v.39 no.1
    • /
    • pp.1-8
    • /
    • 2023
  • Monitoring and prediction of water quality are essential for effective river pollution prevention and water quality management. In this study, a multi-classification model was developed to predict chlorophyll-a (Chl-a) level in rivers. A model was developed using CatBoost, a novel ensemble machine learning algorithm. The model was developed using hourly field monitoring data collected from January 1 to December 31, 2015. For model development, chl-a was classified into class 1 (Chl-a≤10 ㎍/L), class 2 (10<Chl-a≤50 ㎍/L), and class 3 (Chl-a>50 ㎍/L), where the number of data used for the model training were 27,192, 11,031, and 511, respectively. The macro averages of precision, recall, and F1-score for the three classes were 0.58, 0.58, and 0.58, respectively, while the weighted averages were 0.89, 0.90, and 0.89, for precision, recall, and F1-score, respectively. The model showed relatively poor performance for class 3 where the number of observations was much smaller compared to the other two classes. The imbalance of data distribution among the three classes was resolved by using the synthetic minority over-sampling technique (SMOTE) algorithm, where the number of data used for model training was evenly distributed as 26,868 for each class. The model performance was improved with the macro averages of precision, rcall, and F1-score of the three classes as 0.58, 0.70, and 0.59, respectively, while the weighted averages were 0.88, 0.84, and 0.86 after SMOTE application.

Differences in Nutrient Intakes Analysed by Using Food Frequency and Recall Method (빈도법과 회상법에 의한 영양소 섭취 평가의 차이)

  • 김영옥
    • Journal of the Korean Society of Food Science and Nutrition
    • /
    • v.24 no.6
    • /
    • pp.887-891
    • /
    • 1995
  • Nutrient intake data collected by both dietary methods of the 24 hour recall method and the food frequency method from 538 middle school students were analysed to investigate any measurement errors occuring while using these methods. Measurement errors were observed both in terms of differences of average intake and consistancy from the two sources of data used. Wilcoxon signed ranks test was used to test the differences between the two average intakes and Speraman's rank order correlation coefficient was used to test consistancy. As a result, average intake value estimated from the food frequency method tended to be higher than that from the 24 hour recall method. The degree of overestimation varies from one nutrient to another. For instance, carotene showed not only the most significant differences in average intake but also showed the most incoisistancies between the two sets of data. This may imply the validity of nutrient intake as derived from different dietary survey methods varied from one nutrient to another, therefore the selection of dietary survey methods has to be made more cautiously in the case of certainnutrients.

  • PDF