• Title/Summary/Keyword: Confusion Matrix

Search Result 112, Processing Time 0.028 seconds

Using Naïve Bayes Classifier and Confusion Matrix Spelling Correction in OCR (나이브 베이즈 분류기와 혼동 행렬을 이용한 OCR에서의 철자 교정)

  • Noh, Kyung-Mok;Kim, Chang-Hyun;Cheon, Min-Ah;Kim, Jae-Hoon
    • Annual Conference on Human and Language Technology
    • /
    • 2016.10a
    • /
    • pp.310-312
    • /
    • 2016
  • OCR(Optical Character Recognition)의 오류를 줄이기 위해 본 논문에서는 교정 어휘 쌍의 혼동 행렬(confusion matrix)과 나이브 베이즈 분류기($na{\ddot{i}}ve$ Bayes classifier)를 이용한 철자 교정 시스템을 제안한다. 본 시스템에서는 철자 오류 중 한글에 대한 철자 오류만을 교정하였다. 실험에 사용된 말뭉치는 한국어 원시 말뭉치와 OCR 출력 말뭉치, OCR 정답 말뭉치이다. 한국어 원시 말뭉치로부터 자소 단위의 언어 모델(language model)과 교정 후보 검색을 위한 접두사 말뭉치를 구축했고, OCR 출력 말뭉치와 OCR 정답 말뭉치로부터 교정 어휘 쌍을 추출하고, 자소 단위로 분해하여 혼동 행렬을 만들고, 이를 이용하여 오류 모델(error model)을 구축했다. 접두사 말뭉치를 이용해서 교정 후보를 찾고 나이브 베이즈 분류기를 통해 확률이 높은 교정 후보 n개를 제시하였다. 후보 n개 내에 정답 어절이 있다면 교정을 성공하였다고 판단했고, 그 결과 약 97.73%의 인식률을 가지는 OCR에서, 3개의 교정 후보를 제시하였을 때, 약 0.28% 향상된 98.01%의 인식률을 보였다. 이는 한글에 대한 오류를 교정했을 때이며, 향후 특수 문자와 숫자 등을 복합적으로 처리하여 교정을 시도한다면 더 나은 결과를 보여줄 것이라 기대한다.

  • PDF

A Study of Active Pulse Classification Algorithm using Multi-label Convolutional Neural Networks (다중 레이블 콘볼루션 신경회로망을 이용한 능동펄스 식별 알고리즘 연구)

  • Kim, Guenhwan;Lee, Seokjin;Lee, Kyunkyung;Lee, Donghwa
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.25 no.4
    • /
    • pp.29-38
    • /
    • 2020
  • In this research, we proposed the active pulse classification algorithm using multi-label convolutional neural networks for active sonar system. The proposed algorithm has the advantage of being able to acquire the information of the active pulse at a time, unlike the existing single label-based algorithm, which has several neural network structures, and also has an advantage of simplifying the learning process. In order to verify the proposed algorithm, the neural network was trained using sea experimental data. As a result of the analysis, it was confirmed that the proposed algorithm converged, and through the analysis of the confusion matrix, it was confirmed that it has excellent active pulse classification performance.

Optimal threshold using the correlation coefficient for the confusion matrix (혼동행렬의 상관계수를 이용한 최적분류점)

  • Hong, Chong Sun;Oh, Se Hyeon;Choi, Ye Won
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.1
    • /
    • pp.77-91
    • /
    • 2022
  • The optimal threshold estimation is considered in order to discriminate the mixture distribution in the fields of Biostatistics and credit evaluation. There exists well-known various accuracy measures that examine the discriminant power. Recently, Matthews correlation coefficient and the F1 statistic were studied to estimate optimal thresholds. In this study, we explore whether these accuracy measures are appropriate for the optimal threshold to discriminate the mixture distribution. It is found that some accuracy measures that depend on the sample size are not appropriate when two sample sizes are much different. Moreover, an alternative method for finding the optimal threshold is proposed using the correlation coefficient that defines the ratio of the confusion matrix, and the usefulness and utility of this method are also discusses.

Permutation Algorithm for fast Hadamard Transform (고속하다마드 변환을 위한 치환기법)

  • Nam, Ji-Tak;Park, Jin-Bae;Choi, Yun-Ho;Joo, Young-Hoon
    • Proceedings of the KIEE Conference
    • /
    • 1997.07b
    • /
    • pp.616-619
    • /
    • 1997
  • The spectrum-recovery scheme in Hadamard transform spectroscopy is commonly implemented with a fast Hadamard transform (FHT). When the Hadamard or simplex matrix corresponding to the mask does not have the same ordering as the Hadamard matrix corresponding to the FHT, a modification is required. When the two Hadamard matrices are in the same equivalence class, this modification can be implemented as a permutation scheme. This paper investigates permutation schemes for this application. This paper is to relieve the confusion about the applicability of existing techniques, reveals a new, more efficient method: and leads to an extension that allows a permutation scheme to be applied to any Hadamard or simplex matrix in the appropriate equivalence class.

  • PDF

A Study on the UAV-based Vegetable Index Comparison for Detection of Pine Wilt Disease Trees (소나무재선충병 피해목 탐지를 위한 UAV기반의 식생지수 비교 연구)

  • Jung, Yoon-Young;Kim, Sang-Wook
    • Journal of Cadastre & Land InformatiX
    • /
    • v.50 no.1
    • /
    • pp.201-214
    • /
    • 2020
  • This study aimed to early detect damaged trees by pine wilt disease using the vegetation indices of UAV images. The location data of 193 pine wilt disease trees were constructed through field surveys and vegetation index analyses of NDVI, GNDVI, NDRE and SAVI were performed using multi-spectral UAV images at the same time. K-Means algorithm was adopted to classify damaged trees and confusion matrix was used to compare and analyze the classification accuracy. The results of the study are summarized as follows. First, the overall accuracy of the classification was analyzed in order of NDVI (88.04%, Kappa coefficient 0.76) > GNDVI (86.01%, Kappa coefficient 0.72) > NDRE (77.35%, Kappa coefficient 0.55) > SAVI (76.84%, Kappa coefficient 0.54) and showed the highest accuracy of NDVI. Second, K-Means unsupervised classification method using NDVI or GNDVI is possible to some extent to find out the damaged trees. In particular, this technique is to help early detection of damaged trees due to its intensive operation, low user intervention and relatively simple analysis process. In the future, it is expected that the utilization of time series images or the application of deep learning techniques will increase the accuracy of classification.

Approximated Posterior Probability for Scoring Speech Recognition Confidence

  • Kim Kyuhong;Kim Hoirin
    • MALSORI
    • /
    • no.52
    • /
    • pp.101-110
    • /
    • 2004
  • This paper proposes a new confidence measure for utterance verification with posterior probability approximation. The proposed method approximates probabilistic likelihoods by using Viterbi search characteristics and a clustered phoneme confusion matrix. Our measure consists of the weighted linear combination of acoustic and phonetic confidence scores. The proposed algorithm shows better performance even with the reduced computational complexity than those utilizing conventional confidence measures.

  • PDF

Hangul Document Retrieval Using Character Recognition (문자 인식을 이용한 한글 문서 검색)

  • 안재철;오일석
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2001.04b
    • /
    • pp.544-546
    • /
    • 2001
  • 이 논문은 OCR(Optical Character Reader)로 인식된 한글 문서에서의 오인식 경향을 분석하고, 이를 이용한 한글 단어 검색 방법을 제안한다. OCR로 인식된 많은 야의 한글 문서를 기반으로 자모별 인식 빈도수를 계산하고 이를 바탕으로 초성, 중성, 중성별 인식 혼동 행렬(confusion matrix)을 구성하였다. 또한 인식 정보를 적절히 이용하기 Bayes 정리를 이용하였다. 질의어에 대한 오인식 단어의 검색 방법을 제시하고 혼동 행렬과 이 검색 방법을 바탕으로 OCR 기반 단어 검색 시스템을 구축하였다.

  • PDF

Machine-printed Digit Recognition using Weighted Template Matching (가중 템플릿 정합을 이용한 인쇄체 아라비아 숫자 인식)

  • Jung Minchul
    • Proceedings of the KAIS Fall Conference
    • /
    • 2005.05a
    • /
    • pp.180-183
    • /
    • 2005
  • 본 논문에서는 인쇄체 아라비아 숫자를 인식하기 위해 가중 템플릿 정합 방법을 제안한다. 가중 템플릿 정합은 패턴의 특징이 나타나는 영역에 해밍거리(Hamming Distance) 의 가중치를 두어 패턴 특징을 강조하여 숫자 패턴의 인식률을 높이는 것이다. 또한 패턴의 표면을 울퉁불퉁한 영상으로 만드는 한 두 픽셀의 랜덤 노이즈를 제거하기 위하여 본 연구에서는 트리밍(trimming) 기법을 적용하였다. 실험에서는 트리밍을 하지 않고 단순 템플릿 정합을 사용했을 때의 혼돈 행렬(confusion matrix)과 트리밍을 한 후 가중 템플릿 정합을 사용했을 때 혼돈 행렬을 서로 비교해 인식률이 크게 향상된 것을 보인다.

  • PDF

Road Extraction by the Orientation Perception of the Isolated Connected-Components (고립 연결-성분의 방향성 인지에 의한 도로 영역 추출)

  • Lee, Woo-Beom
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.12 no.1
    • /
    • pp.75-81
    • /
    • 2012
  • Road identification is the important task for extracting a road region from the high-resolution satellite images, when the road candidates is extracted by the pre-processing tasks using a binarization, noise removal, and color processing. Therefore, we propose a noble approach for identifying a road using the orientation-selective spatial filters, which is motivated by a computational model of neuron cells found in the primary visual cortex. In our approach, after the neuron cell typed spatial filters is applied to the isolated connected-labeling road candidate regions, proposed method identifies the region of perceiving the strong orientation feature with the real road region. To evaluate the effectiveness of the proposed method, the accuracy&error ratio in the confusion matrix was measured from road candidates including road and non-road class. As a result, the proposed method shows the more than 92% accuracy.

Standardized polytomous discrimination index using concordance (부합성을 이용한 표준화된 다항판별지수)

  • Choi, Jin Soo;Hong, Chong Sun
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.1
    • /
    • pp.33-44
    • /
    • 2016
  • There are many situations that the outcome for clinical decision and credit assessment should be predicted more than two categories. Five kinds of statistics which are used the concordance are proposed and used for these polytomous problems. However, these statistics are defined without exact distinction of categories, so that we have difficulty to use both the pair and set approaches and it is hard to understand the meanings of these statistics. Hence, it is not possible to compare and analyze them. In this paper, the polytomous confusion matrix is standardized and the concordance statistic can be represented based on the confusion matrix. The five kinds of statistics by using the concordance are defined. With the methods proposed in this paper, we could not only explain their meanings but also compare and analyze these statistics. Based on various data sets, properties of these five statistics are explored and explained.