• Title/Summary/Keyword: Recognition Errors

Search Result 353, Processing Time 0.029 seconds

Improved speech emotion recognition using histogram equalization and data augmentation techniques (히스토그램 등화와 데이터 증강 기법을 이용한 개선된 음성 감정 인식)

  • Heo, Woon-Haeng;Kwon, Oh-Wook
    • Phonetics and Speech Sciences
    • /
    • v.9 no.2
    • /
    • pp.77-83
    • /
    • 2017
  • We propose a new method to reduce emotion recognition errors caused by variation in speaker characteristics and speech rate. Firstly, for reducing variation in speaker characteristics, we adjust features from a test speaker to fit the distribution of all training data by using the histogram equalization (HE) algorithm. Secondly, for dealing with variation in speech rate, we augment the training data with speech generated in various speech rates. In computer experiments using EMO-DB, KRN-DB and eNTERFACE-DB, the proposed method is shown to improve weighted accuracy relatively by 34.7%, 23.7% and 28.1%, respectively.

Speech Recognition Interface in the Communication Environment (통신환경에서 음성인식 인터페이스)

  • Han, Tai-Kun;Kim, Jong-Keun;Lee, Dong-Wook
    • Proceedings of the KIEE Conference
    • /
    • 2001.07d
    • /
    • pp.2610-2612
    • /
    • 2001
  • This study examines the recognition of the user's sound command based on speech recognition and natural language processing, and develops the natural language interface agent which can analyze the recognized command. The natural language interface agent consists of speech recognizer and semantic interpreter. Speech recognizer understands speech command and transforms the command into character strings. Semantic interpreter analyzes the character strings and creates the commands and questions to be transferred into the application program. We also consider the problems, related to the speech recognizer and the semantic interpreter, such as the ambiguity of natural language and the ambiguity and the errors from speech recognizer. This kind of natural language interface agent can be applied to the telephony environment involving all kind of communication media such as telephone, fax, e-mail, and so on.

  • PDF

Alzheimer's disease recognition from spontaneous speech using large language models

  • Jeong-Uk Bang;Seung-Hoon Han;Byung-Ok Kang
    • ETRI Journal
    • /
    • v.46 no.1
    • /
    • pp.96-105
    • /
    • 2024
  • We propose a method to automatically predict Alzheimer's disease from speech data using the ChatGPT large language model. Alzheimer's disease patients often exhibit distinctive characteristics when describing images, such as difficulties in recalling words, grammar errors, repetitive language, and incoherent narratives. For prediction, we initially employ a speech recognition system to transcribe participants' speech into text. We then gather opinions by inputting the transcribed text into ChatGPT as well as a prompt designed to solicit fluency evaluations. Subsequently, we extract embeddings from the speech, text, and opinions by the pretrained models. Finally, we use a classifier consisting of transformer blocks and linear layers to identify participants with this type of dementia. Experiments are conducted using the extensively used ADReSSo dataset. The results yield a maximum accuracy of 87.3% when speech, text, and opinions are used in conjunction. This finding suggests the potential of leveraging evaluation feedback from language models to address challenges in Alzheimer's disease recognition.

Fast Handwriting Recognition Using Model Graph (모델 그래프를 이용한 빠른 필기 인식 방법)

  • Oh, Se-Chang
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.16 no.5
    • /
    • pp.892-898
    • /
    • 2012
  • Rough classification methods are used to improving the recognition speed in many character recognition problems. In this case, some irreversible result can occur by an error in rough classification. Methods for duplicating each model in several classes are used in order to reduce this risk. But the errors by rough classfication can not be completely ruled out by these methods. In this paper, an recognition method is proposed to increase speed that matches models selectively without any increase in error. This method constructs a model graph using similarity between models. Then a search process begins from a particular point in the model graph. In this process, matching of unnecessary models are reduced that are not similar to the input pattern. In this paper, the proposed method is applied to the recognition problem of handwriting numbers and upper/lower cases of English alphabets. In the experiments, the proposed method was compared with the basic method that matches all models with input pattern. As a result, the same recognition rate, which has shown as the basic method, was obtained by controlling the out-degree of the model graph and the number of maintaining candidates during the search process thereby being increased the recognition speed to 2.45 times.

A study on the Character Correction of the Wrongly Recognized Sentence Marks, Japanese, English, and Chinese Character in the Off-line printed Character Recognition (오프라인 인쇄체 문장부호, 일본 문자, 영문자, 한자 인식에서의 오인식 문자 교 정에 관한 연구)

  • Lee, Byeong-Hui;Kim, Tae-Gyun
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.1
    • /
    • pp.184-194
    • /
    • 1997
  • In the recent years number of commercial off-line character recognition systems have been appeared in the Korean market. This paper describes a "self -organizing" data structure for representing a large dictionary which can be searched in real time and uses a practical amount of memory, and presents a study on the character correction for off-line printed sentence marks, Japanese, English, and Chinese character recognition. Self-organizing algorithm can be recommenced as particularly appropriate when we have reasons to suspect that the accessing probabilities for individual words will change with time and theme. The wrongly recognized characters generated by OCR systems are collected and analyzed Error types of English characters are reclassified and 0.5% errors are corrected using an English character confusion table with a self-organizing dictionary containing 25,145 English words. And also error types of Chinese characters are classified and 6.1% errors are corrected using a Chinese character confusion table with a self-organizing dictionary carrying 34,593 Chinese words.ese words.

  • PDF

Development of Motion Recognition and Real-time Positioning Technology for Radiotherapy Patients Using Depth Camera and YOLOAddSeg Algorithm (뎁스카메라와 YOLOAddSeg 알고리즘을 이용한 방사선치료환자 미세동작인식 및 실시간 위치보정기술 개발)

  • Ki Yong Park;Gyu Ha Ryu
    • Journal of Biomedical Engineering Research
    • /
    • v.44 no.2
    • /
    • pp.125-138
    • /
    • 2023
  • The development of AI systems for radiation therapy is important to improve the accuracy, effectiveness, and safety of cancer treatment. The current system has the disadvantage of monitoring patients using CCTV, which can cause errors and mistakes in the treatment process, which can lead to misalignment of radiation. Developed the PMRP system, an AI automation system that uses depth cameras to measure patient's fine movements, segment patient's body into parts, align Z values of depth cameras with Z values, and transmit measured feedback to positioning devices in real time, monitoring errors and treatments. The need for such a system began because the CCTV visual monitoring system could not detect fine movements, Z-direction movements, and body part movements, hindering improvement of radiation therapy performance and increasing the risk of side effects in normal tissues. This study could provide the development of a field of radiotherapy that lags in many parts of the world, along with the economic and social importance of developing an independent platform for radiotherapy devices. This study verified its effectiveness and efficiency with data through phantom experiments, and future studies aim to help improve treatment performance by improving the posture correction mechanism and correcting left and right up and down movements in real time.

Performance Comparison and Error Analysis of Korean Bio-medical Named Entity Recognition (한국어 생의학 개체명 인식 성능 비교와 오류 분석)

  • Jae-Hong Lee
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.19 no.4
    • /
    • pp.701-708
    • /
    • 2024
  • The advent of transformer architectures in deep learning has been a major breakthrough in natural language processing research. Object name recognition is a branch of natural language processing and is an important research area for tasks such as information retrieval. It is also important in the biomedical field, but the lack of Korean biomedical corpora for training has limited the development of Korean clinical research using AI. In this study, we built a new biomedical corpus for Korean biomedical entity name recognition and selected language models pre-trained on a large Korean corpus for transfer learning. We compared the name recognition performance of the selected language models by F1-score and the recognition rate by tag, and analyzed the errors. In terms of recognition performance, KlueRoBERTa showed relatively good performance. The error analysis of the tagging process shows that the recognition performance of Disease is excellent, but Body and Treatment are relatively low. This is due to over-segmentation and under-segmentation that fails to properly categorize entity names based on context, and it will be necessary to build a more precise morphological analyzer and a rich lexicon to compensate for the incorrect tagging.

A Study on the Utilization of Speech Recognition Technology in Foreign Language Learning Applications - Focusing on English and French Speech - (외국어 학습용 어플리케이션의 음성 인식 기술 활용 현황 - 영어와 프랑스어 말하기 학습을 중심으로 -)

  • Kim, Sunhee;Jung, Hyunhoon
    • Journal of Digital Contents Society
    • /
    • v.19 no.4
    • /
    • pp.621-630
    • /
    • 2018
  • This paper presents a case study on foreign language learning applications based on the speech recognition technology, aiming to grasp their current status and limitations of the technology applied to the foreign language speaking education, especially for English and French. As a result of examining the characteristics of the selected English and French applications by drawing on speech learning, it is shown that the use of speech recognition technology has the advantage of creating a speaking practice environment and giving feedback. However, in the case of feedback, there is a lack of appropriate calibration feedback which can help learners correct errors by themselves.

Performance Improvement in the Multi-Model Based Speech Recognizer for Continuous Noisy Speech Recognition (연속 잡음 음성 인식을 위한 다 모델 기반 인식기의 성능 향상에 대한 연구)

  • Chung, Yong-Joo
    • Speech Sciences
    • /
    • v.15 no.2
    • /
    • pp.55-65
    • /
    • 2008
  • Recently, the multi-model based speech recognizer has been used quite successfully for noisy speech recognition. For the selection of the reference HMM (hidden Markov model) which best matches the noise type and SNR (signal to noise ratio) of the input testing speech, the estimation of the SNR value using the VAD (voice activity detection) algorithm and the classification of the noise type based on the GMM (Gaussian mixture model) have been done separately in the multi-model framework. As the SNR estimation process is vulnerable to errors, we propose an efficient method which can classify simultaneously the SNR values and noise types. The KL (Kullback-Leibler) distance between the single Gaussian distributions for the noise signal during the training and testing is utilized for the classification. The recognition experiments have been done on the Aurora 2 database showing the usefulness of the model compensation method in the multi-model based speech recognizer. We could also see that further performance improvement was achievable by combining the probability density function of the MCT (multi-condition training) with that of the reference HMM compensated by the D-JA (data-driven Jacobian adaptation) in the multi-model based speech recognizer.

  • PDF

A Study on Combining Bimodal Sensors for Robust Speech Recognition (강인한 음성인식을 위한 이중모드 센서의 결합방식에 관한 연구)

  • 이철우;계영철;고인선
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.6
    • /
    • pp.51-56
    • /
    • 2001
  • Recent researches have been focusing on jointly using lip motions and speech for reliable speech recognitions in noisy environments. To this end, this paper proposes the method of combining the visual speech recognizer and the conventional speech recognizer with each output properly weighted. In particular, we propose the method of autonomously determining the weights, depending on the amounts of noise in the speech. The correlations between adjacent speech samples and the residual errors of the LPC analysis are used for this determination. Simulation results show that the speech recognizer combined in this way provides the recognition performance of 83 % even in severely noisy environments.

  • PDF