• Title/Summary/Keyword: 분류의 오류

Search Result 653, Processing Time 0.028 seconds

Developing a New Algorithm for Conversational Agent to Detect Recognition Error and Neologism Meaning: Utilizing Korean Syllable-based Word Similarity (대화형 에이전트 인식오류 및 신조어 탐지를 위한 알고리즘 개발: 한글 음절 분리 기반의 단어 유사도 활용)

  • Jung-Won Lee;Il Im
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.3
    • /
    • pp.267-286
    • /
    • 2023
  • The conversational agents such as AI speakers utilize voice conversation for human-computer interaction. Voice recognition errors often occur in conversational situations. Recognition errors in user utterance records can be categorized into two types. The first type is misrecognition errors, where the agent fails to recognize the user's speech entirely. The second type is misinterpretation errors, where the user's speech is recognized and services are provided, but the interpretation differs from the user's intention. Among these, misinterpretation errors require separate error detection as they are recorded as successful service interactions. In this study, various text separation methods were applied to detect misinterpretation. For each of these text separation methods, the similarity of consecutive speech pairs using word embedding and document embedding techniques, which convert words and documents into vectors. This approach goes beyond simple word-based similarity calculation to explore a new method for detecting misinterpretation errors. The research method involved utilizing real user utterance records to train and develop a detection model by applying patterns of misinterpretation error causes. The results revealed that the most significant analysis result was obtained through initial consonant extraction for detecting misinterpretation errors caused by the use of unregistered neologisms. Through comparison with other separation methods, different error types could be observed. This study has two main implications. First, for misinterpretation errors that are difficult to detect due to lack of recognition, the study proposed diverse text separation methods and found a novel method that improved performance remarkably. Second, if this is applied to conversational agents or voice recognition services requiring neologism detection, patterns of errors occurring from the voice recognition stage can be specified. The study proposed and verified that even if not categorized as errors, services can be provided according to user-desired results.

The Methods for the Improvement of the KDC 5th Edition of Architecture Engineering Classification System (KDC 제5판 건축공학분야 분류체계 개선 방안)

  • Kim, Yeon-Rye
    • Journal of Korean Library and Information Science Society
    • /
    • v.40 no.4
    • /
    • pp.401-425
    • /
    • 2009
  • This study is intended to present methods improving the classification system of KDC architecture engineering fields after comparing and analyzing the academic system of architecture engineering, classification system of KDC, DDC, and LCC, and that of the research field classification system of National Research Foundation of Korea. The results of the analysis have revealed that it is required to improve and correct the KDC 5th edition of architectural engineering including the addition of classification items that reflect the trend of academic development, proper development in the rank classification terms of architectural structure engineering, addition of detailed subjects, selection of proper classification terms, errors of classification symbols and English expression, and omission of correlative indexes in the classification items. This study has proposed improved methods to solve those problems.

  • PDF

Sequence-to-sequence Autoencoder based Korean Text Error Correction using Syllable-level Multi-hot Vector Representation (음절 단위 Multi-hot 벡터 표현을 활용한 Sequence-to-sequence Autoencoder 기반 한글 오류 보정기)

  • Song, Chisung;Han, Myungsoo;Cho, Hoonyoung;Lee, Kyong-Nim
    • Annual Conference on Human and Language Technology
    • /
    • 2018.10a
    • /
    • pp.661-664
    • /
    • 2018
  • 온라인 게시판 글과 채팅창에서 주고받는 대화는 실제 사용되고 있는 구어체 특성이 잘 반영된 텍스트 코퍼스로 음성인식의 언어 모델 재료로 활용하기 좋은 학습 데이터이다. 하지만 온라인 특성상 노이즈가 많이 포함되어 있기 때문에 학습에 직접 활용하기가 어렵다. 본 논문에서는 사용자 입력오류가 다수 포함된 문장에서의 한글 오류 보정을 위한 sequence-to-sequence Denoising Autoencoder 모델을 제안한다.

  • PDF

Coin Calculation System Using Binarization and Hue Histogram (이진화와 색상 히스토그램을 이용한 동전 계산 시스템)

  • Bae, Jong-Wook;Jung, Sung-Hwan
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.6
    • /
    • pp.424-429
    • /
    • 2015
  • This research proposes a new system for calculating the total amount of coins in an image. The proposed system identified and classified the coins in the image in realtime. The image was obtained using a USB camera. Most previous coin calculation systems only used size information. If the size of an object was incorrectly detected, it caused a misclassification. Especially, in case of the former 10 won, it had high error rate because it was similar in size to the 50 won and 100 won coin. The proposed system combines hue histogram information with size information to reduce errors in the classification process. When we only used size information in the classification experiment of 2,290 coins, the recognition rate was on average about 88.2%. When we combined hue information with size information the recognition rate increased to about 99.3%.

Construction Scheme of Training Data using Automated Exploring of Boundary Categories (경계범주 자동탐색에 의한 확장된 학습체계 구성방법)

  • Choi, Yun-Jeong;Jee, Jeong-Gyu;Park, Seung-Soo
    • The KIPS Transactions:PartB
    • /
    • v.16B no.6
    • /
    • pp.479-488
    • /
    • 2009
  • This paper shows a reinforced construction scheme of training data for improvement of text classification by automatic search of boundary category. The documents laid on boundary area are usually misclassified as they are including multiple topics and features. which is the main factor that we focus on. In this paper, we propose an automated exploring methodology of optimal boundary category based on previous research. We consider the boundary area among target categories to new category to be required training, which are then added to the target category sementically. In experiments, we applied our method to complex documents by intentionally making errors in training process. The experimental results show that our system has high accuracy and reliability in noisy environment.

An Analysis of the Class 'Philosophy' in tile 4th Revised and Enlarged Edition of KDC (한국십진분류법 치4판 철학류의 분석)

  • Park Ok-Wha
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.31 no.3
    • /
    • pp.7-22
    • /
    • 1997
  • Korean Library Association brought out the fourth revised and enlarged edition of KDC last year. Compared with the former edition It Is a marked improvement. Neverthless, it leaves much room for improvement. In order to examine and evaluate the edition more effectively, I confined my study to the class 'Philosophy'. In my judgment the problem resolves itself into following three points: 1) Each regions, blanches of philosophy is not properly balanced. As is generally known KDC was originally derived from DDC. As a result KDC and DBC are similiar with regard to their stress on the philosophical tradition of the West. In consequence, it is lacking in universality. 2) The classifiers neglected on several occassions the logical regulations of classification. The vertical and horizontal relations between the subjects are not strictly respected. 3) The persons concerned were not well informed of philosophical conceptions and genealogies. There are some misused conceptions and disorganized genealogies of philosophy. To my knowledge these problems originate in the lack of professional understanding of philosophy necessary to make the work satisfactory As a result of the examination I came to the conclusion that it is inevitable for the classifiers, to ask to specialists in philosophy for mutual cooperation. Without their professional advices the classifiers will find difficulty in solving the problems and in improving the classification

  • PDF

The Methods for the Improvement of the KDC 5th Edition of Education Classification System (KDC 제5판 교육학분야 분류체계 개선 방안)

  • Kim, Yeon-Rye
    • Journal of Korean Library and Information Science Society
    • /
    • v.41 no.4
    • /
    • pp.5-33
    • /
    • 2010
  • This study is intended to present methods improving the classification system of KDC education fields after comparing and analyzing the academic system of education, classification system of KDC, NDC, DDC and LCC, and that of the research field classification system of National Research Foundation of Korea. The results of the analysis have revealed that it is required to improve and correct the KDC 5th edition of education including the addition of classification items that reflect the trend of academic development, proper development in the rank classification terms of education detailed fields, addition of detailed subjects, errors of classification symbols and omission of correlative indexes in the classification items. This study has proposed improved methods to solve those problems.

  • PDF

Hybrid Approach to SVM Error Reduction in Document Classification (문서 분류에서의 SVM 오류 감소를 위한 하이브리드 방법)

  • Lee Jun-Seok;Kim Sang-Soo;Park Seong-Bae;Lee Sang-jo
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2005.11b
    • /
    • pp.544-546
    • /
    • 2005
  • 본 논문에서는 문서 분류(document classification) 성능을 높이기 위해 다음과 같은 방법을 제안한다. 먼저 패턴 분류 문제에 있어서 우수한 성능을 보이는 SVM(Support Vector Machine)을 사용하여 분류 하고, 마진을 만족하는 데이터를 다시 k-NN 으로 분류를 한다. 단순히 SVM만을 사용한것보다. k-NN을 함께 사용한것이 더 높은 성능을 보였다.

  • PDF

User Preference Prediction Method Using Associative User Clustering and Bayesian Classification (연관 사용자 군집과 베이지안 분류를 이용한 사용자 선호도 예측 방법)

  • 정경용;김진현;이정현
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2001.10b
    • /
    • pp.109-111
    • /
    • 2001
  • 기존의 협력적 필터링 기술을 이용한 사용자 선호도 예측 방법에서는 아이템에 대한 사용자의 선호도를 기반으로 이웃 선정 방법(Nearest-Neighborhood Method)을 사용하고, 피어슨 상관 계수에 의해 사용자의 유사도를 구하므로 아이템에 대한 내용을 반영하지 못할 뿐만 아니라 희박성 문제를 해결하지 못하였다. 본 논문에서는 기존의 사용자 선호도 예측 방법의 문제점을 보완하기 위하여 연관 사용자 군집과 베이지안 분류를 이음한 사용자 선호도 예측 방법을 제안한다. 제안한 방법에서는 협력적 필터링 시스템에서의 희박성(Sparsity)문제를 해결하기 위하여 ARHP 알고리즘을 사용하여 사용자를 장르별로 군집하며 새로운 사용자는 Naive Bayes 분류자에 의해 이들 장르 중 하나로 분류된다. 또한, 분류된 장르 내에 속한 사용자들과 새로운 사용자의 유사도출 구하기 위해 Naive Bayes 학습을 통해 사용자가 평가한 아이템에 추정치를 달리 부여한다. 추정치가 부여된 선호도를 기존의 피어슨 상관 관계에 적용할 경우 결측치(Missing Value)로 인한 예측의 오류를 적게 하여 예측의 정확도를 높일 수 있다. 제안된 방법의 성능을 평가하기 위해서 기존의 협력적 필터링 기술과 비교 평가하였다.

  • PDF

Context-sensitive Spelling Error Correction using Eojeol N-gram (어절 N-gram을 이용한 문맥의존 철자오류 교정)

  • Kim, Minho;Kwon, Hyuk-Chul;Choi, Sungki
    • Journal of KIISE
    • /
    • v.41 no.12
    • /
    • pp.1081-1089
    • /
    • 2014
  • Context-sensitive spelling-error correction methods are largely classified into rule-based methods and statistical data-based methods, the latter of which is often preferred in research. Statistical error correction methods consider context-sensitive spelling error problems as word-sense disambiguation problems. The method divides a vocabulary pair, for correction, which consists of a correction target vocabulary and a replacement candidate vocabulary, according to the context. The present paper proposes a method that integrates a word-phrase n-gram model into a conventional model in order to improve the performance of the probability model by using a correction vocabulary pair, which was a result of a previous study performed by this research team. The integrated model suggested in this paper includes a method used to interpolate the probability of a sentence calculated through each model and a method used to apply the models, when both methods are sequentially applied. Both aforementioned types of integrated models exhibit relatively high accuracy and reproducibility when compared to conventional models or to a model that uses only an n-gram.