• Title/Summary/Keyword: speech speed

Search Result 239, Processing Time 0.028 seconds

Comparison of the Dynamic Time Warping Algorithm for Spoken Korean Isolated Digits Recognition (한국어 단독 숫자음 인식을 위한 DTW 알고리즘의 비교)

  • 홍진우;김순협
    • The Journal of the Acoustical Society of Korea
    • /
    • v.3 no.1
    • /
    • pp.25-35
    • /
    • 1984
  • This paper analysis the Dynamic Time Warping algorithms for time normalization of speech pattern and discusses the Dynamic Programming algorithm for spoken Korean isolated digits recognition. In the DP matching, feature vectors of the reference and test pattern are consisted of first three formant frequencies extracted by power spectrum density estimation algorithm of the ARMA model. The major differences in the various DTW algorithms include the global path constrains, the local continuity constraints on the path, and the distance weighting/normalization used to give the overall minimum distance. The performance criterias to evaluate these DP algorithms are memory requirement, speed of implementation, and recognition accuracy.

  • PDF

Acoustic Characteristics of Stop Consonant Production in the Motor Speech Disorders (운동성 조음장애에서 폐쇄자음 발성의 음향학적 특성)

  • Hong, Hee-Kyung;Kim, Moon-Jun;Yoon, Jin;Park, Hee-Taek;Hong, Ki-Hwan
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.23 no.1
    • /
    • pp.33-42
    • /
    • 2012
  • Background and Objectives : Dysarthria refers to speech disorder that causes difficulties in speech communication due to paralysis, muscle weakening, and incoordination of speech muscle mechanism caused by damaged central or peripheral nerve system. Pitch, strength and speed are influenced by dysarthria during detonation due to difficulties in muscle control. As evaluation items, alternate motion rate and diadochokinesis have been commonly used, and articulation is also an important evaluation items. The purpose of this study is to find acoustic characteristics on sound production of dysarthria patients. Materials and Methods : Research subjects have been selected as 20 dysarthria patients and 20 subjects for control group, and voice sample was composed of bilabial, alveolar sound, and velar sound in diadochokinetic rate, while consonant articulation test was composed of bilabial plosive, alveolar plosive, velar plosive. Analysis items were composed of 1) speaking rate, energy, articulation time of diadochokinesis, 2) voice onset time (VOT), total duration (TD), vowel duration (VD), hold of plosives. Results and Conclusions : The number of diadochokinetic rate of dysarthria was smaller than control group. Both control group and dysarthria group was highly presented in the order of /t/>/p/>/k/. Minimum energy range per cycle during diadochokinetic rate of dysarthria group was smaller than control group, and presented statistical significance in /p/, /k/, /ptk/. Maximum energy range was larger than control group, and presented statistical significance in /t/, /ptk/. Articulation time, gap, total articulation time during diadochokinetic rate of dysarthria group was longer than control group and presented statistical significance. The articulation time was presented in both control group and dysarthria group in the order of /k/>/t/>/p/, while Gap was presented in the order of /p/>/t/>/k/ for control group and /p/>/k/>/t/ for dysarthria group. VOT, TD, VD regarding plosives of dysarthria group were longer than control group. Hold showed large deviation compared to control group that had appeared due to declined larynx and articulation organ motility.

  • PDF

Cyber Threats Analysis of AI Voice Recognition-based Services with Automatic Speaker Verification (화자식별 기반의 AI 음성인식 서비스에 대한 사이버 위협 분석)

  • Hong, Chunho;Cho, Youngho
    • Journal of Internet Computing and Services
    • /
    • v.22 no.6
    • /
    • pp.33-40
    • /
    • 2021
  • Automatic Speech Recognition(ASR) is a technology that analyzes human speech sound into speech signals and then automatically converts them into character strings that can be understandable by human. Speech recognition technology has evolved from the basic level of recognizing a single word to the advanced level of recognizing sentences consisting of multiple words. In real-time voice conversation, the high recognition rate improves the convenience of natural information delivery and expands the scope of voice-based applications. On the other hand, with the active application of speech recognition technology, concerns about related cyber attacks and threats are also increasing. According to the existing studies, researches on the technology development itself, such as the design of the Automatic Speaker Verification(ASV) technique and improvement of accuracy, are being actively conducted. However, there are not many analysis studies of attacks and threats in depth and variety. In this study, we propose a cyber attack model that bypasses voice authentication by simply manipulating voice frequency and voice speed for AI voice recognition service equipped with automated identification technology and analyze cyber threats by conducting extensive experiments on the automated identification system of commercial smartphones. Through this, we intend to inform the seriousness of the related cyber threats and raise interests in research on effective countermeasures.

Robust Speech Recognition Algorithm of Voice Activated Powered Wheelchair for Severely Disabled Person (중증 장애우용 음성구동 휠체어를 위한 강인한 음성인식 알고리즘)

  • Suk, Soo-Young;Chung, Hyun-Yeol
    • The Journal of the Acoustical Society of Korea
    • /
    • v.26 no.6
    • /
    • pp.250-258
    • /
    • 2007
  • Current speech recognition technology s achieved high performance with the development of hardware devices, however it is insufficient for some applications where high reliability is required, such as voice control of powered wheelchairs for disabled persons. For the system which aims to operate powered wheelchairs safely by voice in real environment, we need to consider that non-voice commands such as user s coughing, breathing, and spark-like mechanical noise should be rejected and the wheelchair system need to recognize the speech commands affected by disability, which contains specific pronunciation speed and frequency. In this paper, we propose non-voice rejection method to perform voice/non-voice classification using both YIN based fundamental frequency(F0) extraction and reliability in preprocessing. We adopted a multi-template dictionary and acoustic modeling based speaker adaptation to cope with the pronunciation variation of inarticulately uttered speech. From the recognition tests conducted with the data collected in real environment, proposed YIN based fundamental extraction showed recall-precision rate of 95.1% better than that of 62% by cepstrum based method. Recognition test by a new system applied with multi-template dictionary and MAP adaptation also showed much higher accuracy of 99.5% than that of 78.6% by baseline system.

A High Speed Pitch Extraction Method Based on Peak Detection and AMDF (Peak 검출과 AMDF에 의한 고속도 음성주기 추출방법)

  • 성원용;은종관
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.17 no.4
    • /
    • pp.38-44
    • /
    • 1980
  • We present a high speed pitch estimation algorithm that is based on peak detection and average magnitude difference function (AMDF). A few pitch candidates are first estimated from the low-pass filtered (800 Hz) speech by a peak detection algorithm. AMDF values of the pitch candidatestare then calculated, and the pitch candidate that yields the minimum AMDF value is chosen as the desired pitch period. The new method requires far less computation time than other pitch estimation algorithms, while it yields fairly accurate results.

  • PDF

Achieving Faster User Enrollment for Neural Speaker Verification Systems

  • Lee, Tae-Seung;Park, Sung-Won;Lim, Sang-Seok;Hwang, Byong-Won
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2003.09a
    • /
    • pp.205-208
    • /
    • 2003
  • While multilayer perceptrons (MLPs) have great possibility on the application to speaker verification, they suffer from inferior learning speed. to appeal to users, the speaker verification systems based on MLPs must achieve a reasonable enrolling speed and it is thoroughly dependent on the fast learning of MLPs. To attain real-time enrollment on the systems, the previous two studies have been devoted to the problem and each satisfied the objective. In this paper the two studies are combined md applied to the systems, on the assumption that each method operates on different optimization principle. By conducting experiments using an MLP-based speaker verification system to which the combination is applied on real speech database, the feasibility of the combination is verified from the results of the experiments.

  • PDF

A Study on Rhythmic Units in Korean -with Respect to Syntactic Structure- (한국어의 리듬 단위에 관한 연구 - 문법 구조와 관련하여)

  • Kim, Sun-Mi
    • Proceedings of the KSPS conference
    • /
    • 1996.10a
    • /
    • pp.224-228
    • /
    • 1996
  • This paper is intended as a study on how an utterance is divided into rhythmic units in Standard Korean with respect to its syntactic structure. With respect to the data in this study I used 150 sentences which contained similar number of words and various syntactic structures. Those sentences were read by 7 speakers of Seoul dialect in a conversation style. Each sentence was read twice in a normal speed and twice in a fast speed. As a total, 4200 sentences were recorded. Then listening to them, the author marked the sentences with two kinds of boundaries i.e. strong and weak. To explore the relationship between rhythmic units and syntactic structure I devised a framework of grammatical symbols. Each symbol is designed to have both syntactic and morphological information at the same time. So I assigned those grammatical symbols to the sentences. Having sentences marked with grammatical symbols on the one hand, and with the rhythmic boundaries on the other hand, 1 could show the relationship between rhythmic units and syntactic structure; which syntactic structures are likely to be pronounced as one rhythmic unit, and which are on the rhythmic boundaries.

  • PDF

On the Frequency Dependency of Sound Quality Factors (음질 요소의 주파수 의존성에 대하여)

  • 류윤선;최재원;조희복
    • Proceedings of the Korean Society for Noise and Vibration Engineering Conference
    • /
    • 1997.10a
    • /
    • pp.286-292
    • /
    • 1997
  • Sound quality is becoming the major concern in passenger vehicle. The study on it has been done recently but it is not good enough. In order to improve the sound quality in passenger vehicle, so many noise sources must be considered and human feeling to the noise also be taken into account. In this paper, the sound quality was analyzed by vehicle road test which was carried out with varying the traveling speed. As basic factors for sound quality, only objective factors are considered such as loudness, sharpness, speech intelligibility, sound pressure level ... etc. The relations between sound pressure level and other factors are discussed from a point of view of traveling speed dependency. The frequency dependency of sound quality factor is also analyzed by frequency analysis.

  • PDF

Improved Error Backpropagation by Elastic Learning Rate and Online Update (가변학습율과 온라인모드를 이용한 개선된 EBP 알고리즘)

  • Lee, Tae-Seung;Park, Ho-Jin
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2004.04b
    • /
    • pp.568-570
    • /
    • 2004
  • The error-backpropagation (EBP) algerithm for training multilayer perceptrons (MLPs) is known to have good features of robustness and economical efficiency. However, the algorithm has difficulty in selecting an optimal constant learning rate and thus results in non-optimal learning speed and inflexible operation for working data. This paper Introduces an elastic learning rate that guarantees convergence of learning and its local realization by online upoate of MLP parameters Into the original EBP algorithm in order to complement the non-optimality. The results of experiments on a speaker verification system with Korean speech database are presented and discussed to demonstrate the performance improvement of the proposed method in terms of learning speed and flexibility fer working data of the original EBP algorithm.

  • PDF

Speaker Recognition Using Dynamic Time Variation fo Orthogonal Parameters (직교인자의 동적 특성을 이용한 화자인식)

  • 배철수
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.17 no.9
    • /
    • pp.993-1000
    • /
    • 1992
  • Recently, many researchers have found that the speaker recognition rate is high when they perform the speaker recognition using statistical processing method of orthogonal parameter, which are derived from the analysis of speech signal and contain much of the speaker's identity. This method, however, has problems caused by vocalization speed or time varying feature of speed. Thus, to solve these problems, this paper proposes two methods of speaker recognition which combine DTW algorithm with the method using orthogonal parameters extracted from $Karthumem-Lo\'{e}ve$ Transform method which applies orthogonal parameters as feature vector to ETW algorithm and the other is the method which applies orthogonal parameters to the optimal path. In addition, we compare speaker recognition rate obtained from the proposed two method with that from the conventional method of statistical process of orthogonal parameters. Orthogonal parameters used in this paper are derived from both linear prediction coefficients and partial correlation coefficients of speech signal.

  • PDF