• Title/Summary/Keyword: Speech Recognition Technology

Search Result 527, Processing Time 0.022 seconds

An Acoustical Study of Korean Diphthongs (한국어 이중모음의 음향학적 연구)

  • Yang Byeong-Gon
    • MALSORI
    • /
    • no.25_26
    • /
    • pp.3-26
    • /
    • 1993
  • The goals of the present study were (3) to collect and analyze sets of fundamental frequency (F0) and formant frequency (F1, F2, F3) data of Korean diphthongs from ten linguistically homogeneous speakers of Korean males, and (2) to make a comparative study of Korean monophthongs and diphthongs. Various definitions, kinds, and previous studies of diphthongs were examined in the introduction. Procedures for screening subjects to form a linguistically homogeneous group, time point selection and formant determination were explained in the following section. The principal findings were as follows: 1. Much variation was observed in the ongliding part of diphthongs. 2. F2 values of (j) group descended while those of [w] group ascended, 3. The average duration of diphthongs were about 110 msec, and there was not much variation between speakers and diphthongs. 4. In a comparative study of monophthongs and diphthongs, Fl and F2 values of the same offgliding part at the third time point almost converged. 5. The gliding of diphthongs was very short beginning from the h-noise. Perceptual studies using speech synthesis are desirable to find major parameters for diphthongs. The results of the present study wi11 be useful in the area of automated speech recognition and computer synthesis of speech.

  • PDF

A Proposition of the Fuzzy Correlation Dimension for Speaker Recognition (화자인식을 위한 퍼지상관차원 제안)

  • Yoo, Byong-Wook;Kim, Chang-Seok;Park, Hyun-Sook
    • Journal of the Korean Institute of Telematics and Electronics S
    • /
    • v.36S no.1
    • /
    • pp.115-122
    • /
    • 1999
  • In this paper, we confirmed that a speech signal is a chaos signal, and in order to use it as a speaker recognition parameter, analyzed chaos dimension. In order to raise speaker identification and pattern recognition, by making up the strange attractor involving an individual's vocal tract characteristics very well and applying fuzzy membership function to correlation dimension, we proposed fuzzy correlation dimension. By estimating the correlation of the points making up an attractor are limited according space dimension value, fuzzy correlation dimension absorbed the variation of the reference pattern attractor and test pattern attractor. Concerning fuzzy correlation dimension, by estimating the distance according to the average value of discrimination error per each speaker and reference pattern, investigated the validity of speaker recognition parameter.

  • PDF

A Study on the Syllable Recognition Using Neural Network Predictive HMM

  • Kim, Soo-Hoon;Kim, Sang-Berm;Koh, Si-Young;Hur, Kang-In
    • The Journal of the Acoustical Society of Korea
    • /
    • v.17 no.2E
    • /
    • pp.26-30
    • /
    • 1998
  • In this paper, we compose neural network predictive HMM(NNPHMM) to provide the dynamic feature of the speech pattern for the HMM. The NNPHMM is the hybrid network of neura network and the HMM. The NNPHMM trained to predict the future vector, varies each time. It is used instead of the mean vector in the HMM. In the experiment, we compared the recognition abilities of the one hundred Korean syllables according to the variation of hidden layer, state number and prediction orders of the NNPHMM. The hidden layer of NNPHMM increased from 10 dimensions to 30 dimensions, the state number increased from 4 to 6 and the prediction orders increased from 10 dimensions to 30 dimension, the state number increased from 4 to 6 and the prediction orders increased from the second oder to the fourth order. The NNPHMM in the experiment is composed of multi-layer perceptron with one hidden layer and CMHMM. As a result of the experiment, the case of prediction order is the second, the average recognition rate increased 3.5% when the state number is changed from 4 to 5. The case of prediction order is the third, the recognition rate increased 4.0%, and the case of prediction order is fourth, the recognition rate increased 3.2%. But the recognition rate decreased when the state number is changed from 5 to 6.

  • PDF

Performance Improvement of Korean Connected Digit Recognition Using Various Discriminant Analyses (다양한 변별분석을 통한 한국어 연결숫자 인식 성능향상에 관한 연구)

  • Song Hwa Jeon;Kim Hyung Soon
    • MALSORI
    • /
    • no.44
    • /
    • pp.105-113
    • /
    • 2002
  • In Korean, each digit is monosyllable and some pairs are known to have high confusability, causing performance degradation of connected digit recognition systems. To improve the performance, in this paper, we employ various discriminant analyses (DA) including Linear DA (LDA), Weighted Pairwise Scatter LDA WPS-LDA), Heteroscedastic Discriminant Analysis (HDA), and Maximum Likelihood Linear Transformation (MLLT). We also examine several combinations of various DA for additional performance improvement. Experimental results show that applying any DA mentioned above improves the string accuracy, but the amount of improvement of each DA method varies according to the model complexity or number of mixtures per state. Especially, more than 20% of string error reduction is achieved by applying MLLT after WPS-LDA, compared with the baseline system, when class level of DA is defined as a tied state and 1 mixture per state is used.

  • PDF

A study on the perception of POA and voicing in relation to the release and nonrelease in the English word-final stops (영어 어말 폐쇄음 파열 유무에 따른 위치성 및 유.무성성 인식에 관한 연구)

  • Rhee Seok-Chae;Kang Sooha;Park Jihyun;Hwang Sunmin
    • Proceedings of the KSPS conference
    • /
    • 2003.10a
    • /
    • pp.43-49
    • /
    • 2003
  • This study reveals the perceptual role of stop release burst to Koreans' recognition of POA(place of articulation) and voicing in the English word-final stops. 10 Korean subjects participated in a perception experiment wherein the stimuli are prepared on the basis of the amount of acoustic information, which includes the release burst. The result shows that i) release burst plays an important role in the recognition of POA in the order of velar, alveolar, and bilabial stops, and ii) the release burst more enhances the correct recognition of voiceless stops than that of voiced stops. This result leads us to conclude that the role of stop release burst differs with respect to the POA and voicing of the stops, and it is possibly related to the different intensity of release in voicing and in each POA.

  • PDF

Selective Adaptation of Speaker Characteristics within a Subcluster Neural Network

  • Haskey, S.J.;Datta, S.
    • Proceedings of the KSPS conference
    • /
    • 1996.10a
    • /
    • pp.464-467
    • /
    • 1996
  • This paper aims to exploit inter/intra-speaker phoneme sub-class variations as criteria for adaptation in a phoneme recognition system based on a novel neural network architecture. Using a subcluster neural network design based on the One-Class-in-One-Network (OCON) feed forward subnets, similar to those proposed by Kung (2) and Jou (1), joined by a common front-end layer. the idea is to adapt only the neurons within the common front-end layer of the network. Consequently resulting in an adaptation which can be concentrated primarily on the speakers vocal characteristics. Since the adaptation occurs in an area common to all classes, convergence on a single class will improve the recognition of the remaining classes in the network. Results show that adaptation towards a phoneme, in the vowel sub-class, for speakers MDABO and MWBTO Improve the recognition of remaining vowel sub-class phonemes from the same speaker

  • PDF

Utterance Verification and Substitution Error Correction In Korean Connected Digit Recognition (한국어 연결숫자 인식에서의 발화 검증과 대체오류 수정)

  • Jung Du Kyung;Song Hwa Jeon;Jung Ho-Young;Kim Hyung Soon
    • MALSORI
    • /
    • no.45
    • /
    • pp.79-91
    • /
    • 2003
  • Utterance verification aims at rejecting both out-of-vocabulary (OOV) utterances and low-confidence-scored in-vocabulary (IV) utterances. For utterance verification on Korean connected digit recognition task, we investigate several methods to construct filler and anti-digit models. In particular, we propose a substitution error correction method based on 2-best decoding results. In this method, when 1st candidate is rejected, 2nd candidate is selected if it is accepted by a specific hypothesis test, instead of simply rejecting the 1st one. Experimental results show that the proposed method outperforms the conventional log likelihood ratio (LLR) test method.

  • PDF

Development of AI-based Real Time Agent Advisor System on Call Center - Focused on N Bank Call Center (AI기반 콜센터 실시간 상담 도우미 시스템 개발 - N은행 콜센터 사례를 중심으로)

  • Ryu, Ki-Dong;Park, Jong-Pil;Kim, Young-min;Lee, Dong-Hoon;Kim, Woo-Je
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.20 no.2
    • /
    • pp.750-762
    • /
    • 2019
  • The importance of the call center as a contact point for the enterprise is growing. However, call centers have difficulty with their operating agents due to the agents' lack of knowledge and owing to frequent agent turnover due to downturns in the business, which causes deterioration in the quality of customer service. Therefore, through an N-bank call center case study, we developed a system to reduce the burden of keeping up business knowledge and to improve customer service quality. It is a "real-time agent advisor" system that provides agents with answers to customer questions in real time by combining AI technology for speech recognition, natural language processing, and questions & answers for existing call center information systems, such as a private branch exchange (PBX) and computer telephony integration (CTI). As a result of the case study, we confirmed that the speech recognition system for real-time call analysis and the corpus construction method improves the natural speech processing performance of the query response system. Especially with name entity recognition (NER), the accuracy of the corpus learning improved by 31%. Also, after applying the agent advisor system, the positive feedback rate of agents about the answers from the agent advisor was 93.1%, which proved the system is helpful to the agents.

Automatic Electronic Medical Record Generation System using Speech Recognition and Natural Language Processing Deep Learning (음성인식과 자연어 처리 딥러닝을 통한 전자의무기록자동 생성 시스템)

  • Hyeon-kon Son;Gi-hwan Ryu
    • The Journal of the Convergence on Culture Technology
    • /
    • v.9 no.3
    • /
    • pp.731-736
    • /
    • 2023
  • Recently, the medical field has been applying mandatory Electronic Medical Records (EMRs) and Electronic Health Records (EHRs) systems that computerize and manage medical records, and distributing them throughout the entire medical industry to utilize patients' past medical records for additional medical procedures. However, the conversations between medical professionals and patients that occur during general medical consultations and counseling sessions are not separately recorded or stored, so additional important patient information cannot be efficiently utilized. Therefore, we propose an electronic medical record system that uses speech recognition and natural language processing deep learning to store conversations between medical professionals and patients in text form, automatically extracts and summarizes important medical consultation information, and generates electronic medical records. The system acquires text information through the recognition process of medical professionals and patients' medical consultation content. The acquired text is then divided into multiple sentences, and the importance of multiple keywords included in the generated sentences is calculated. Based on the calculated importance, the system ranks multiple sentences and summarizes them to create the final electronic medical record data. The proposed system's performance is verified to be excellent through quantitative analysis.

Development of Automatic Creating Web-Site Tool for the Blind (시각장애인용 웹사이트 자동생성 툴 개발)

  • Baek, Hyeun-Ki;Ha, Tai-Hyun
    • Journal of Digital Contents Society
    • /
    • v.8 no.4
    • /
    • pp.467-474
    • /
    • 2007
  • This paper documents the design and implementation of an automatic creating web-site tool for the blind to build their own homepage by using both voice recognition and voice mixed technology with equal ease as the non-disabled. The blind can make voice mails, schedules, address lists and bookmarks by making use of the tool. It also facilitates communication between the non-disabled with the help of their information management system. This tool converts basic commands into voice recognition, also making an offer of text-to-speech which supports voice output. In the end, the tool will remove the blind's social isolation, allowing them to enjoy the information age like the non-disabled.

  • PDF