• Title/Summary/Keyword: automatic recognition

Search Result 1,069, Processing Time 0.024 seconds

Statistical Analysis of Korean Phonological Rules Using a Automatic Phonetic Transcription (발음열 자동 변환을 이용한 한국어 음운 변화 규칙의 통계적 분석)

  • Lee Kyong-Nim;Chung Minhwa
    • Proceedings of the KSPS conference
    • /
    • 2002.11a
    • /
    • pp.81-85
    • /
    • 2002
  • We present a statistical analysis of Korean phonological variations using automatic generation of phonetic transcription. We have constructed the automatic generation system of Korean pronunciation variants by applying rules modeling obligatory and optional phonemic changes and allophonic changes. These rules are derived from knowledge-based morphophonological analysis and government standard pronunciation rules. This system is optimized for continuous speech recognition by generating phonetic transcriptions for training and constructing a pronunciation dictionary for recognition. In this paper, we describe Korean phonological variations by analyzing the statistics of phonemic change rule applications for the 60,000 sentences in the Samsung PBS(Phonetic Balanced Sentence) Speech DB. Our results show that the most frequently happening obligatory phonemic variations are in the order of liaison, tensification, aspirationalization, and nasalization of obstruent, and that the most frequently happening optional phonemic variations are in the order of initial consonant h-deletion, insertion of final consonant with the same place of articulation as the next consonants, and deletion of final consonant with the same place of articulation as the next consonants. These statistics can be used for improving the performance of speech recognition systems.

  • PDF

A Study on the Automatic Monitoring System for the Contact Center Using Emotion Recognition and Keyword Spotting Method (감성인식과 핵심어인식 기술을 이용한 고객센터 자동 모니터링 시스템에 대한 연구)

  • Yoon, Won-Jung;Kim, Tae-Hong;Park, Kyu-Sik
    • Journal of Internet Computing and Services
    • /
    • v.13 no.3
    • /
    • pp.107-114
    • /
    • 2012
  • In this paper, we proposed an automatic monitoring system for contact center in order to manage customer's complaint and agent's quality. The proposed system allows more accurate monitoring using emotion recognition and keyword spotting method for neutral/anger voice emotion. The system can provide professional consultation and management for the customer with language violence, such as abuse and sexual harassment. We developed a method of building robust algorithm on heterogeneous speech DB of many unspecified customers. Experimental results confirm the stable and improved performance using real contact center speech data.

Eojeol-Block Bidirectional Algorithm for Automatic Word Spacing of Hangul Sentences (한글 문장의 자동 띄어쓰기를 위한 어절 블록 양방향 알고리즘)

  • Kang, Seung-Shik
    • Journal of KIISE:Software and Applications
    • /
    • v.27 no.4
    • /
    • pp.441-447
    • /
    • 2000
  • Automatic word spacing is needed to solve the automatic indexing problem of the non-spaced documents and the space-insertion problem of the character recognition system at the end of a line. We propose a word spacing algorithm that automatically finds out word spacing positions. It is based on the recognition of Eojeol components by using the sentence partition and bidirectional longest-match algorithm. The sentence partition utilizes an extraction of Eojeol-block where the Eojeol boundary is relatively clear, and a Korean morphological analyzer is applied bidirectionally to the recognition of Eojeol components. We tested the algorithm on two sentence groups of about 4,500 Eojeols. The space-level recall ratio was 97.3% and the Eojeol-level recall ratio was 93.2%.

  • PDF

Adaptive Korean Continuous Speech Recognizer to Speech Rate (발화속도 적응적인 한국어 연속음 인식기)

  • Kim, Jae-Beom;Park, Chan-Kyu;Han, Mi-Sung;Lee, Jung-Hyun
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.6
    • /
    • pp.1531-1540
    • /
    • 1997
  • In this paper, we presents automatic Korean continuous speech recognizer which is improved by the speech rate estimation and the compensation methods. Automatic continuous speech recognition is significantly more difficult than isolated word recognition because of coarticulatory effects and variations in speech rate. In order to recognize continuous speech, modeling methods of coarticulatory effects and variations in speech rate are needed. In this paper, the speech rate is measured by change of format, and the compensation is peformed by extracting relatively many feature vectors in fast speech. Coarticulatory effects are modeled by defining 514 Korean diphone set, and ETRI's 445 word DB is used for training speech material. With combining above methods, we implement automatic Korean continuous speech recognizer, which shows improved recognition rate, based on DHMM(Discrete Hidden Markov Model).

  • PDF

Automatic Term Recognition using Domain Similarity and Statistical Methods (분야간 유사도와 통계기법을 이용한 전문용어의 자동 추출)

  • Oh, Jong-Hoon;Lee, Kyung-Soon;Choi, Key-Sun
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.4
    • /
    • pp.258-269
    • /
    • 2002
  • There have been many studies of automatic term recognition (ATR) and they have achieved good results. However, there are scopes to improve the performance of extracting terms still further by using the additional technical dictionaries. This paper focuses on the method for extracting terms using the hierarchy among technical dictionaries. Moreover, a statistical method based on frequencies, foreign words, and nested relations assists extracting terms which do not appear in dictionaries. Our method produces relatively good results for this task.

Automatic proficiency assessment of Korean speech read aloud by non-natives using bidirectional LSTM-based speech recognition

  • Oh, Yoo Rhee;Park, Kiyoung;Jeon, Hyung-Bae;Park, Jeon Gue
    • ETRI Journal
    • /
    • v.42 no.5
    • /
    • pp.761-772
    • /
    • 2020
  • This paper presents an automatic proficiency assessment method for a non-native Korean read utterance using bidirectional long short-term memory (BLSTM)-based acoustic models (AMs) and speech data augmentation techniques. Specifically, the proposed method considers two scenarios, with and without prompted text. The proposed method with the prompted text performs (a) a speech feature extraction step, (b) a forced-alignment step using a native AM and non-native AM, and (c) a linear regression-based proficiency scoring step for the five proficiency scores. Meanwhile, the proposed method without the prompted text additionally performs Korean speech recognition and a subword un-segmentation for the missing text. The experimental results indicate that the proposed method with prompted text improves the performance for all scores when compared to a method employing conventional AMs. In addition, the proposed method without the prompted text has a fluency score performance comparable to that of the method with prompted text.

Hyperparameter experiments on end-to-end automatic speech recognition

  • Yang, Hyungwon;Nam, Hosung
    • Phonetics and Speech Sciences
    • /
    • v.13 no.1
    • /
    • pp.45-51
    • /
    • 2021
  • End-to-end (E2E) automatic speech recognition (ASR) has achieved promising performance gains with the introduced self-attention network, Transformer. However, due to training time and the number of hyperparameters, finding the optimal hyperparameter set is computationally expensive. This paper investigates the impact of hyperparameters in the Transformer network to answer two questions: which hyperparameter plays a critical role in the task performance and training speed. The Transformer network for training has two encoder and decoder networks combined with Connectionist Temporal Classification (CTC). We have trained the model with Wall Street Journal (WSJ) SI-284 and tested on devl93 and eval92. Seventeen hyperparameters were selected from the ESPnet training configuration, and varying ranges of values were used for experiments. The result shows that "num blocks" and "linear units" hyperparameters in the encoder and decoder networks reduce Word Error Rate (WER) significantly. However, performance gain is more prominent when they are altered in the encoder network. Training duration also linearly increased as "num blocks" and "linear units" hyperparameters' values grow. Based on the experimental results, we collected the optimal values from each hyperparameter and reduced the WER up to 2.9/1.9 from dev93 and eval93 respectively.

Machine Fault Diagnosis Method based on DWT Power Spectral Density using Multi Patten Recognition (다중 패턴 인식 기법을 이용한 DWT 전력 스펙트럼 밀도 기반 기계 고장 진단 기법)

  • Kang, Kyung-Won;Lee, Kyeong-Min;Vununu, Caleb;Kwon, Ki-Ryong
    • Journal of Korea Multimedia Society
    • /
    • v.22 no.11
    • /
    • pp.1233-1241
    • /
    • 2019
  • The goal of the sound-based mechanical fault diagnosis technique is to automatically find abnormal signals in the machine using acoustic emission. Conventional methods of using mathematical models have been found to be inaccurate due to the complexity of industrial mechanical systems and the existence of nonlinear factors such as noise. Therefore, any fault diagnosis issue can be treated as a pattern recognition problem. We propose an automatic fault diagnosis method using discrete wavelet transform and power spectrum density using multi pattern recognition. First, we perform DWT-based filtering analysis for noise cancelling and effective feature extraction. Next, the power spectral density(PSD) is performed on each subband of the DWT in order to effectively extract feature vectors of sound. Finally, each PSD data is extracted with the features of the classifier using multi pattern recognition. The results show that the proposed method can not only be used effectively to detect faults as well as apply to various automatic diagnosis system based on sound.

Automatic Recognition of Bank Security Card Using Smart Phone (스마트폰을 이용한 은행 보안카드 자동 인식)

  • Kim, Jin-Ho
    • The Journal of the Korea Contents Association
    • /
    • v.16 no.12
    • /
    • pp.19-26
    • /
    • 2016
  • Among the various services for mobile banking, user authentication method using bank security card is still very useful. We can use mobile banking easily and safely in case of saving encoded security codes in smart phone and entering codes automatically whenever user authentication is required without bank security card. In this paper automatic recognition algorithm of security codes of bank security card is proposed in oder to enroll the encoded security codes into smart phone using smart phone camera. Advanced adaptive binarization is used for extracting digit segments from various background image pattern and adaptive 2-dimensional layout analysis method is developed for segmentation and recognition of damaged or touched digits. Experimental results of proposed algorithm using Android and iPhone, show excellent security code recognition results.

Speech Estimators Based on Generalized Gamma Distribution and Spectral Gain Floor Applied to an Automatic Speech Recognition (잡음에 강인한 음성인식을 위한 Generalized Gamma 분포기반과 Spectral Gain Floor를 결합한 음성향상기법)

  • Kim, Hyoung-Gook;Shin, Dong;Lee, Jin-Ho
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.8 no.3
    • /
    • pp.64-70
    • /
    • 2009
  • This paper presents a speech enhancement technique based on generalized Gamma distribution in order to obtain robust speech recognition performance. For robust speech enhancement, the noise estimation based on a spectral noise floor controled recursive averaging spectral values is applied to speech estimation under the generalized Gamma distribution and spectral gain floor. The proposed speech enhancement technique is based on spectral component, spectral amplitude, and log spectral amplitude. The performance of three different methods is measured by recognition accuracy of automatic speech recognition (ASR).

  • PDF