• Title/Summary/Keyword: Precision-recall

Search Result 717, Processing Time 0.026 seconds

Automatic Generation of Code-clone Reference Corpus (코드클론 표본 집합체 자동 생성기)

  • Lee, Hyo-Sub;Doh, Kyung-Goo
    • Journal of Software Assessment and Valuation
    • /
    • v.7 no.1
    • /
    • pp.29-39
    • /
    • 2011
  • To evaluate the quality of clone detection tools, we should know how many clones the tool misses. Hence we need to have the standard code-clone reference corpus for a carefully chosen set of sample source codes. The reference corpus available so far has been built by manually collecting clones from the results of various existing tools. This paper presents a tree-pattern-based clone detection tool that can be used for automatic generation of reference corpus. Our tool is compared with CloneDR for precision and Bellon's reference corpus for recall. Our tool finds no false positives and 2 to 3 times more clones than CloneDR. Compared to Bellon's reference corpus, our tools shows the 93%-to-100% recall rate and detects far more clones.

Within-and between-Individual Variation in Nutrient Intkes Assessed by Recall and Record Methods among College Women (회상법과 기록법으로 측정한 여대생의 영양소 섭취량에서의 개인내 변이와 개인간 변이)

  • 오세영
    • Journal of Nutrition and Health
    • /
    • v.29 no.9
    • /
    • pp.1028-1034
    • /
    • 1996
  • This study examined within-and between-individual variation in nutrient intakes in order to estimate the degrees of precison in dietary assessment among 59 female volunteers aged 21-23 years. Self-recorded 7-day dietary recalls and records were collected by during a 3 month period. Between the recall and record methods, there were little difference of within-and between-individual variations. Within-to-between individual variation ratios were > 2.0 for most of the nutrients examined, and were higher for niacin, vitamin A and C (>2.5) in the recals and for calcium, iron, vitamin A and C(>3.0) in the records. With 7-day dietary data, observed nutrient intakes were estimated to within 26-107% of the subjects' true(usual) intakes, among those vitamin C and energy showed the highest and lowest values, respectively. Correlation coefficients between observed and true nutrient intakes were 0.73-0.81 for the recalls and 0.68-0.77 for the records. In order to estimate with 20% precision, 12-13 days of dietary study were required for energy, 46 for calcium, 71-72 for vitamin A, and 199-200 for vitamin C. Attenuation factor ranged 0.73-0.81 for the recalls and 0.68-0.77 for the records. This study implies that commonly used 1 or 3 day dietary studies may not be appropriate for assessing individuals' nutrient intakes. Further research focusing on the methodological issues in the assessment of Korean diet are needed for between understanding of the relationship between diet and health in Koreans.

  • PDF

Automated Segmentation of the Lateral Ventricle Based on Graph Cuts Algorithm and Morphological Operations

  • Park, Seongbeom;Yoon, Uicheul
    • Journal of Biomedical Engineering Research
    • /
    • v.38 no.2
    • /
    • pp.82-88
    • /
    • 2017
  • Enlargement of the lateral ventricles have been identified as a surrogate marker of neurological disorders. Quantitative measure of the lateral ventricle from MRI would enable earlier and more accurate clinical diagnosis in monitoring disease progression. Even though it requires an automated or semi-automated segmentation method for objective quantification, it is difficult to define lateral ventricles due to insufficient contrast and brightness of structural imaging. In this study, we proposed a fully automated lateral ventricle segmentation method based on a graph cuts algorithm combined with atlas-based segmentation and connected component labeling. Initially, initial seeds for graph cuts were defined by atlas-based segmentation (ATS). They were adjusted by partial volume images in order to provide accurate a priori information on graph cuts. A graph cuts algorithm is to finds a global minimum of energy with minimum cut/maximum flow algorithm function on graph. In addition, connected component labeling used to remove false ventricle regions. The proposed method was validated with the well-known tools using the dice similarity index, recall and precision values. The proposed method was significantly higher dice similarity index ($0.860{\pm}0.036$, p < 0.001) and recall ($0.833{\pm}0.037$, p < 0.001) compared with other tools. Therefore, the proposed method yielded a robust and reliable segmentation result.

A Study of an Image Retrieval Method using Binary Subimage (이진 부분영상을 이용한 영상 검색 기법에 관한 연구)

  • 정순영;최민규;남재열
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.2 no.1
    • /
    • pp.28-37
    • /
    • 2001
  • An image retrieval method combining shape information of 2-dimension color histograms with color information of HSI color histograms is proposed in this paper. In addition, the proposed method can find location information of image through the comparison of similarity among subimages. The suggested retrieval method applies the location information to shape and color information and can retrieve region information which is hard to distinguish in the binary image. Some simulation results show that it works very well in the behalf of precision/recall graph compare with conventional method which uses color histogram. Especially, the proposed method brought well effects such as rotations and transitions of the objects in an image was found.

  • PDF

Construction of an Internet of Things Industry Chain Classification Model Based on IRFA and Text Analysis

  • Zhimin Wang
    • Journal of Information Processing Systems
    • /
    • v.20 no.2
    • /
    • pp.215-225
    • /
    • 2024
  • With the rapid development of Internet of Things (IoT) and big data technology, a large amount of data will be generated during the operation of related industries. How to classify the generated data accurately has become the core of research on data mining and processing in IoT industry chain. This study constructs a classification model of IoT industry chain based on improved random forest algorithm and text analysis, aiming to achieve efficient and accurate classification of IoT industry chain big data by improving traditional algorithms. The accuracy, precision, recall, and AUC value size of the traditional Random Forest algorithm and the algorithm used in the paper are compared on different datasets. The experimental results show that the algorithm model used in this paper has better performance on different datasets, and the accuracy and recall performance on four datasets are better than the traditional algorithm, and the accuracy performance on two datasets, P-I Diabetes and Loan Default, is better than the random forest model, and its final data classification results are better. Through the construction of this model, we can accurately classify the massive data generated in the IoT industry chain, thus providing more research value for the data mining and processing technology of the IoT industry chain.

Query Expansion Using Augmented Terms in an Extended Boolean Model

  • Nguyen, Tuan-Quang;Heo, Jun-Seok;Lee, Jung-Hoon;Kim, Yi-Reun;Whang, Kyu-Young
    • Journal of Computing Science and Engineering
    • /
    • v.2 no.1
    • /
    • pp.26-43
    • /
    • 2008
  • We propose a new query expansion method in the extended Boolean model that improves precision without degrading recall. For improving precision, our method promotes the ranks of documents having more query terms since users typically prefer such documents. The proposed method consists of the following three steps: (1) expanding the query by adding new terms related to each term of the query, (2) further expanding the query by adding augmented terms, which are conjunctions of the terms, (3) assigning a weight on each term so that augmented terms have higher weights than the other terms. We conduct extensive experiments to show the effectiveness of the proposed method. The experimental results show that the proposed method improves precision by up to 102% for the TREC-6 data compared with the existing query expansion method using a thesaurus proposed by Kwon et al.

Development of Evaluation Metrics that Consider Data Imbalance between Classes in Facies Classification (지도학습 기반 암상 분류 시 클래스 간 자료 불균형을 고려한 평가지표 개발)

  • Kim, Dowan;Choi, Junhwan;Byun, Joongmoo
    • Geophysics and Geophysical Exploration
    • /
    • v.23 no.3
    • /
    • pp.131-140
    • /
    • 2020
  • In training a classification model using machine learning, the acquisition of training data is a very important stage, because the amount and quality of the training data greatly influence the model performance. However, when the cost of obtaining data is so high that it is difficult to build ideal training data, the number of samples for each class may be acquired very differently, and a serious data-imbalance problem can occur. If such a problem occurs in the training data, all classes are not trained equally, and classes containing relatively few data will have significantly lower recall values. Additionally, the reliability of evaluation indices such as accuracy and precision will be reduced. Therefore, this study sought to overcome the problem of data imbalance in two stages. First, we introduced weighted accuracy and weighted precision as new evaluation indices that can take into account a data-imbalance ratio by modifying conventional measures of accuracy and precision. Next, oversampling was performed to balance weighted precision and recall among classes. We verified the algorithm by applying it to the problem of facies classification. As a result, the imbalance between majority and minority classes was greatly mitigated, and the boundaries between classes could be more clearly identified.

Character Detection and Recognition of Steel Materials in Construction Drawings using YOLOv4-based Small Object Detection Techniques (YOLOv4 기반의 소형 물체탐지기법을 이용한 건설도면 내 철강 자재 문자 검출 및 인식기법)

  • Sim, Ji-Woo;Woo, Hee-Jo;Kim, Yoonhwan;Kim, Eung-Tae
    • Journal of Broadcast Engineering
    • /
    • v.27 no.3
    • /
    • pp.391-401
    • /
    • 2022
  • As deep learning-based object detection and recognition research have been developed recently, the scope of application to industry and real life is expanding. But deep learning-based systems in the construction system are still much less studied. Calculating materials in the construction system is still manual, so it is a reality that transactions of wrong volumn calculation are generated due to a lot of time required and difficulty in accurate accumulation. A fast and accurate automatic drawing recognition system is required to solve this problem. Therefore, we propose an AI-based automatic drawing recognition accumulation system that detects and recognizes steel materials in construction drawings. To accurately detect steel materials in construction drawings, we propose data augmentation techniques and spatial attention modules for improving small object detection performance based on YOLOv4. The detected steel material area is recognized by text, and the number of steel materials is integrated based on the predicted characters. Experimental results show that the proposed method increases the accuracy and precision by 1.8% and 16%, respectively, compared with the conventional YOLOv4. As for the proposed method, Precision performance was 0.938. The recall was 1. Average Precision AP0.5 was 99.4% and AP0.5:0.95 was 67%. Accuracy for character recognition obtained 99.9.% by configuring and learning a suitable dataset that contains fonts used in construction drawings compared to the 75.6% using the existing dataset. The average time required per image was 0.013 seconds in the detection, 0.65 seconds in character recognition, and 0.16 seconds in the accumulation, resulting in 0.84 seconds.

Analysis Study on the Detection and Classification of COVID-19 in Chest X-ray Images using Artificial Intelligence (인공지능을 활용한 흉부 엑스선 영상의 코로나19 검출 및 분류에 대한 분석 연구)

  • Yoon, Myeong-Seong;Kwon, Chae-Rim;Kim, Sung-Min;Kim, Su-In;Jo, Sung-Jun;Choi, Yu-Chan;Kim, Sang-Hyun
    • Journal of the Korean Society of Radiology
    • /
    • v.16 no.5
    • /
    • pp.661-672
    • /
    • 2022
  • After the outbreak of the SARS-CoV2 virus that causes COVID-19, it spreads around the world with the number of infections and deaths rising rapidly caused a shortage of medical resources. As a way to solve this problem, chest X-ray diagnosis using Artificial Intelligence(AI) received attention as a primary diagnostic method. The purpose of this study is to comprehensively analyze the detection of COVID-19 via AI. To achieve this purpose, 292 studies were collected through a series of Classification methods. Based on these data, performance measurement information including Accuracy, Precision, Area Under Cover(AUC), Sensitivity, Specificity, F1-score, Recall, K-fold, Architecture and Class were analyzed. As a result, the average Accuracy, Precision, AUC, Sensitivity and Specificity were achieved as 95.2%, 94.81%, 94.01%, 93.5%, and 93.92%, respectively. Although the performance measurement information on a year-on-year basis gradually increased, furthermore, we conducted a study on the rate of change according to the number of Class and image data, the ratio of use of Architecture and about the K-fold. Currently, diagnosis of COVID-19 using AI has several problems to be used independently, however, it is expected that it will be sufficient to be used as a doctor's assistant.

A Study on Orthogonal Image Detection Precision Improvement Using Data of Dead Pine Trees Extracted by Period Based on U-Net model (U-Net 모델에 기반한 기간별 추출 소나무 고사목 데이터를 이용한 정사영상 탐지 정밀도 향상 연구)

  • Kim, Sung Hun;Kwon, Ki Wook;Kim, Jun Hyun
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.40 no.4
    • /
    • pp.251-260
    • /
    • 2022
  • Although the number of trees affected by pine wilt disease is decreasing, the affected area is expanding across the country. Recently, with the development of deep learning technology, it is being rapidly applied to the detection study of pine wilt nematodes and dead trees. The purpose of this study is to efficiently acquire deep learning training data and acquire accurate true values to further improve the detection ability of U-Net models through learning. To achieve this purpose, by using a filtering method applying a step-by-step deep learning algorithm the ambiguous analysis basis of the deep learning model is minimized, enabling efficient analysis and judgment. As a result of the analysis the U-Net model using the true values analyzed by period in the detection and performance improvement of dead pine trees of wilt nematode using the U-Net algorithm had a recall rate of -0.5%p than the U-Net model using the previously provided true values, precision was 7.6%p and F-1 score was 4.1%p. In the future, it is judged that there is a possibility to increase the precision of wilt detection by applying various filtering techniques, and it is judged that the drone surveillance method using drone orthographic images and artificial intelligence can be used in the pine wilt nematode disaster prevention project.