• Title/Summary/Keyword: Voice Training

Search Result 177, Processing Time 0.023 seconds

Automated Call Routing Call Center System Based on Speech Recognition (음성인식을 이용한 고객센터 자동 호 분류 시스템)

  • Shim, Yu-Jin;Kim, Jae-In;Koo, Myung-Wan
    • Speech Sciences
    • /
    • v.12 no.2
    • /
    • pp.183-191
    • /
    • 2005
  • This paper describes the automated call routing for call center system based on speech recognition. We focus on the task of automatically routing telephone calls based on a users fluently spoken response instead of touch tone menus in an interactive voice response system. Vector based call routing algorithm is investigated and normalization method suggested. Call center database which was collected by KT is used for call routing experiment. Experimental results evaluating call-classification from transcribed speech are reported for that database. In case of small training data, an average call routing error reduction rate of 9% is observed when normalization method is used.

  • PDF

Considering Dynamic Non-Segmental Phonetics

  • Fujino, Yoshinari
    • Proceedings of the KSPS conference
    • /
    • 2000.07a
    • /
    • pp.312-320
    • /
    • 2000
  • This presentation aims to explore some possibility of non-segmental phonetics usually ignored in phonetics education. In pedagogical phonetics, especially ESL/EFL oriented phonetics speech sounds tend to be classified in two criteria 1) 'pronunciation' which deals with segments and 2) 'prosody' or 'suprasegmentals', a criterion that deals with non-segmental elements such as stress and intonation. However, speech involves more dynamic processing. It is non-linear and multi-dimensional in spite of the linear sequence of symbols in phonetic/phonological transcriptions. No word is without pitch or voice quality apart from segmental characteristics whether it is spoken in isolation or cut out from continuous speech. This simply tells the dichotomy of pronunciation and prosody is merely a useful convention. There exists some room to consider dynamic non-segmental phonetics. Examples of non-segmental phonetic investigation, some of the analyses conducted within the frame of Firthian Prosodic Analysis, especially of the relation between vowel variants and foot types, are examined and we see what kind of auditory phonetic training is required to understand impressionistic transcriptions which lie behind the non-segmental phonetics.

  • PDF

Recognition of the Korean Alphabet using Phase Synchronization of Neural Oscillator

  • Lee, Joon-Tark;Bum, Kwon-Yong
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.14 no.1
    • /
    • pp.93-99
    • /
    • 2004
  • Neural oscillator can be applied to oscillatory systems such as analyses of image information, voice recognition and etc. Conventional EBPA (Error back Propagation Algorithm) is not proper for oscillatory systems with the complicate input`s patterns because of its tedious training procedures and sluggish convergence problems. However, these problems can be easily solved by using a synchrony characteristic of neural oscillator with PLL(Phase Locked Loop) function and by using a simple Hebbian learning rule. Therefore, in this paper, a technique for Recognition of the Korean Alphabet using Phase Synchronized Neural Oscillator was introduced.

Machine Learning-Based Programming Analysis Model Proposal : Based on User Behavioral Analysis

  • Jang, Seonghoon;Shin, Seung-Jung
    • International journal of advanced smart convergence
    • /
    • v.9 no.4
    • /
    • pp.179-183
    • /
    • 2020
  • The online education platform market is developing rapidly after the coronavirus infection-19 pandemic. As school classes at various levels are converted to non-face-to-face classes, interest in non-face-to-face online education is increasing more than ever. However, the majority of online platforms currently used are limited to the fragmentary functions of simply delivering images, voice and messages, and there are limitations to online hands-on training. Indeed, digital transformation is a traditional business method for increasing coding education and a corporate approach to service operation innovation strategy computing thinking power and platform model. There are many ways to evaluate a computer programmer's ability. Generally, piecemeal evaluation methods are used to evaluate results in time through coding tests. In this study, the purpose of this study is to propose a comprehensive evaluation of not only the results of writing, but also the execution process of the results, etc., and to evaluate the programmer's propensity habits based on the programmer's coding experience to evaluate the programmer's ability and productivity.

The Use of Blackboard by Students During the COVID-19 Pandemic

  • Alghamdi, Deena
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.3
    • /
    • pp.319-325
    • /
    • 2022
  • By using the Blackboard (BB) system in the education sector, the educational process for both academics and students is facilitated. Two data resources were used to evaluate the use of the BB system by students of Umm Al-Qura University: statistical reports issued by the university and an online questionnaire. A total of 989 students from all colleges and different programmes provided by the university responded to the questionnaire survey. According to our findings, most students did not use the BB before the pandemic. Therefore, the sudden conversion to the BB system required intensive training courses. After the data analysis, the relationship between the use of the BB system before the pandemic and the problems students faced during the lockdown was revealed. The most critical issues raised by the respondents were: (1) "The voice of the lecturer went on and off during BB collaborate class", (2) "internet connection of the lecturer went on and off during BB collaborate class" and (3) "High possibility of IT problems during exams".

Deep Learning-based Speech Voice Separation Training To Enhance STT Performance (STT 성능 향상을 위한 딥러닝 기반 발화 음성 분리학습)

  • Kim, Bokyoung;Yang, Youngjun;Hwang, Yonghae;Kim, Kyuheon
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2022.06a
    • /
    • pp.851-853
    • /
    • 2022
  • 인공지능을 활용한 다양한 딥러닝 기술의 보급과 상용화로 오디오 음성 인식 분야에서도 음성 인식의 정확도를 높이기 위한 다양한 연구가 진행되고 있다. 최근 STT 를 위한 음성 인식 엔진은 딥러닝 기술을 기반으로 과거에 비해 높은 정확도를 보이고 있다. 하지만 예능 프로그램, 드라마, 스포츠 방송 등과 같이 비음성 신호와 음성 신호가 함께 녹음되는 오디오의 경우 음성 인식 정확도가 크게 낮아지는 문제가 발생한다. 이에 본 연구에서는 다양한 장르의 오디오를 음성과 음악을 분리하는 딥러닝 모델을 활용하여 음성 신호와 비음성 신호로 분리하는 방법을 제시하고, STT 결과를 분석하여 음성 인식의 정확도를 높이기 위한 연구 방향을 제시한다.

  • PDF

Speaker Identification Using Dynamic Time Warping Algorithm (동적 시간 신축 알고리즘을 이용한 화자 식별)

  • Jeong, Seung-Do
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.12 no.5
    • /
    • pp.2402-2409
    • /
    • 2011
  • The voice has distinguishable acoustic properties of speaker as well as transmitting information. The speaker recognition is the method to figures out who speaks the words through acoustic differences between speakers. The speaker recognition is roughly divided two kinds of categories: speaker verification and identification. The speaker verification is the method which verifies speaker himself based on only one's voice. Otherwise, the speaker identification is the method to find speaker by searching most similar model in the database previously consisted of multiple subordinate sentences. This paper composes feature vector from extracting MFCC coefficients and uses the dynamic time warping algorithm to compare the similarity between features. In order to describe common characteristic based on phonological features of spoken words, two subordinate sentences for each speaker are used as the training data. Thus, it is possible to identify the speaker who didn't say the same word which is previously stored in the database.

Smart Home Personalization Service based on Context Information using Speech (음성인식을 이용한 상황정보 기반의 스마트 흠 개인화 서비스)

  • Kim, Jong-Hun;Song, Chang-Woo;Kim, Ju-Hyun;Chung, Kyung-Yong;Rim, Kee-Wook;Lee, Jung-Hyun
    • The Journal of the Korea Contents Association
    • /
    • v.9 no.11
    • /
    • pp.80-89
    • /
    • 2009
  • The importance of personalized services has been attracted in smart home environments according to the development of ubiquitous computering. In this paper, we proposed the smart home personalized service system based on context information using the speech recognition. The proposed service consists of an OSGi framework based service mobile manager, service manager, voice recognition manager, and location manager. Also, this study defines the smart home space and configures the commands of units, sensor information, and user information that are largely used in the defined space as context information. In particular, this service identifies users who exist in the same space that shows a difficulty in the identification using RFID through the training model and pattern matching in voice recognition and supports the personalized service of smart home applications. In the results of the experiment, it was verified that the OSGi based automated and personalized service can be achieved through verifying users in the same space.

A Study on the Voice Dialing using HMM and Post Processing of the Connected Digits (HMM과 연결 숫자음의 후처리를 이용한 음성 다이얼링에 관한 연구)

  • Yang, Jin-Woo;Kim, Soon-Hyob
    • The Journal of the Acoustical Society of Korea
    • /
    • v.14 no.5
    • /
    • pp.74-82
    • /
    • 1995
  • This paper is study on the voice dialing using HMM and post processing of the connected digits. HMM algorithm is widely used in the speech recognition with a good result. But, the maximum likelihood estimation of HMM(Hidden Markov Model) training in the speech recognition does not lead to values which maximize recognition rate. To solve the problem, we applied the post processing to segmental K-means procedure are in the recognition experiment. Korea connected digits are influenced by the prolongation more than English connected digits. To decrease the segmentation error in the level building algorithm some word models which can be produced by the prolongation are added. Some rules for the added models are applied to the recognition result and it is updated. The recognition system was implemented with DSP board having a TMS320C30 processor and IBM PC. The reference patterns were made by 3 male speakers in the noisy laboratory. The recognition experiment was performed for 21 sort of telephone number, 252 data. The recognition rate was $6\%$ in the speaker dependent, and $80.5\%$ in the speaker independent recognition test.

  • PDF

The Study on the Characteristics of Korean Stop Consonants (한국어 파열자음의 특성에 관한 연구)

  • 서동일;표화영;강성석;최홍식
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.8 no.2
    • /
    • pp.217-224
    • /
    • 1997
  • The present study was performed to investigate the voice onset time(VOT) of Korean stop consonants as the expanded research of Pyo and Choi(1996) : the intensity, and the air flow rate of Korean stops as the preliminary study f3r the classical singing training. Nine Korean stops(/P, P', $P^{h}$/, /t, t', $t^{h}$/, /k, k', $k^{h}$/) and a vowel /a/ were used as speech materials. CV and VCV syllable patterns were used for VOT measurement, and CV pattern was used for intensity and air flow rate measurement. Five males and five females pronounced the speech tasks with comfortable pitch and intensity : VOT, intensity, and air flow rate were measured. As results, the prevocalic stop consonants showed bilabials, the shortest VOT and velars, the longest one, except the unaspirated stops which showed the shortest was velar /k'/, and the alveolar /t'/ was the longest. Considering the tensity, heavily aspirated stops showed the longest, and the unaspirated, the shortest. Also the intervocalic stops showed similar results with the prevocalic stops, except the slightly aspirated stops which showed alveolar sound was the longest, and the bilabials, which showed the shortest was the slightly aspirated /p/, unlike the prevocalic stops, the unaspirated /p'/ the shortest. All of prevocalic stops showed the highest air flow rate in heavily aspirated stops, the second, thee slightly aspirated ones, and the lowest was the unaspirated stops. And as a whole, bilabials were the highest, and velars, the lowest, except in the heavily aspirated stops, which was the alveolar sound, the lowest. In the dimension of intensity, the unaspirated and bilabials were the highest, and the heavily aspirated and velars were e lowest, except the slightly aspirated stops, which were the bilabials the lowest, and the alveolars the highest.

  • PDF