• Title/Summary/Keyword: Speech Spectrogram

Search Result 90, Processing Time 0.02 seconds

An Acoustic Study of Pitch Rules of Chinese Poetry (한시의 평측법에 대한 음향음성학적 연구)

  • Cho, Sung-Moon
    • Speech Sciences
    • /
    • v.14 no.3
    • /
    • pp.59-76
    • /
    • 2007
  • The purpose of this study is to investigate the pitch rules of Chinese poetry. Pitch rules are concerned with the high tone and the low tone. Because Chinese poetry is a fixed form of verse, it must keep pitch rules to compose Chinese poetry. But until now there has been no acoustic study of pitch rules of Chinese poetry. So, for the first time the present study investigates pitch rules of Chinese poetry acoustically. Pitch contours were analyzed from the sound spectrogram made by Praat. Results showed that actual pitch patterns did not coincide with theoretical pitch rules in reciting Chinese poetry. Therefore, in studying Chinese classics, the Chinese poetry, which has traditionally been considered to be recited according to original Chinese pitch rules, must now be considered in terms of how pitch rules may have changed over time in Korea since it was first introduce to Korean scholars.

  • PDF

On-Line Audio Genre Classification using Spectrogram and Deep Neural Network (스펙트로그램과 심층 신경망을 이용한 온라인 오디오 장르 분류)

  • Yun, Ho-Won;Shin, Seong-Hyeon;Jang, Woo-Jin;Park, Hochong
    • Journal of Broadcast Engineering
    • /
    • v.21 no.6
    • /
    • pp.977-985
    • /
    • 2016
  • In this paper, we propose a new method for on-line genre classification using spectrogram and deep neural network. For on-line processing, the proposed method inputs an audio signal for a time period of 1sec and classifies its genre among 3 genres of speech, music, and effect. In order to provide the generality of processing, it uses the spectrogram as a feature vector, instead of MFCC which has been widely used for audio analysis. We measure the performance of genre classification using real TV audio signals, and confirm that the proposed method has better performance than the conventional method for all genres. In particular, it decreases the rate of classification error between music and effect, which often occurs in the conventional method.

Automatic Phonetic Segmentation of Korean Speech Signal Using Phonetic-acoustic Transition Information (음소 음향학적 변화 정보를 이용한 한국어 음성신호의 자동 음소 분할)

  • 박창목;왕지남
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.8
    • /
    • pp.24-30
    • /
    • 2001
  • This article is concerned with automatic segmentation for Korean speech signals. All kinds of transition cases of phonetic units are classified into 3 types and different strategies for each type are applied. The type 1 is the discrimination of silence, voiced-speech and unvoiced-speech. The histogram analysis of each indicators which consists of wavelet coefficients and SVF (Spectral Variation Function) in wavelet coefficients are used for type 1 segmentation. The type 2 is the discrimination of adjacent vowels. The vowel transition cases can be characterized by spectrogram. Given phonetic transcription and transition pattern spectrogram, the speech signal, having consecutive vowels, are automatically segmented by the template matching. The type 3 is the discrimination of vowel and voiced-consonants. The smoothed short-time RMS energy of Wavelet low pass component and SVF in cepstral coefficients are adopted for type 3 segmentation. The experiment is performed for 342 words utterance set. The speech data are gathered from 6 speakers. The result shows the validity of the method.

  • PDF

Acoustic Features of Phonatory Offset-Onset in the Connected Speech between a Female Stutterer and Non-Stutterers (연속구어 내 발성 종결-개시의 음향학적 특징 - 말더듬 화자와 비말더듬 화자 비교 -)

  • Han, Ji-Yeon;Lee, Ok-Bun
    • Speech Sciences
    • /
    • v.13 no.2
    • /
    • pp.19-33
    • /
    • 2006
  • The purpose of this paper was to examine acoustical characteristics of phonatory offset-onset mechanism in the connected speech of female adults with stuttering and normal nonfluency. The phonatory offset-onset mechanism refers to the laryngeal articulatory gestures. Those gestures are required to mark word boundaries in phonetic contexts of the connected speech. This mechanism included 7 patterns based on the speech spectrogram. This study showed the acoustic features in the connected speech in the production of female adults with stuttering (n=1) and normal nonfluency (n=3). Speech tokens in V_V, V_H, and V_S contexts were selected for the analysis. Speech samples were recorded by Sound Forge, and the spectrographic analysis was conducted using Praat. Results revealed a stuttering (with a type of block) female exhibited more laryngealization gestures in the V_V context. Laryngealization gesture was more characterized by a complete glottal stop or glottal fry both in V_H and in V_S contexts. The results were discussed from theoretical and clinical perspectives.

  • PDF

A Novel Approach to COVID-19 Diagnosis Based on Mel Spectrogram Features and Artificial Intelligence Techniques

  • Alfaidi, Aseel;Alshahrani, Abdullah;Aljohani, Maha
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.9
    • /
    • pp.195-207
    • /
    • 2022
  • COVID-19 has remained one of the most serious health crises in recent history, resulting in the tragic loss of lives and significant economic impacts on the entire world. The difficulty of controlling COVID-19 poses a threat to the global health sector. Considering that Artificial Intelligence (AI) has contributed to improving research methods and solving problems facing diverse fields of study, AI algorithms have also proven effective in disease detection and early diagnosis. Specifically, acoustic features offer a promising prospect for the early detection of respiratory diseases. Motivated by these observations, this study conceptualized a speech-based diagnostic model to aid in COVID-19 diagnosis. The proposed methodology uses speech signals from confirmed positive and negative cases of COVID-19 to extract features through the pre-trained Visual Geometry Group (VGG-16) model based on Mel spectrogram images. This is used in addition to the K-means algorithm that determines effective features, followed by a Genetic Algorithm-Support Vector Machine (GA-SVM) classifier to classify cases. The experimental findings indicate the proposed methodology's capability to classify COVID-19 and NOT COVID-19 of varying ages and speaking different languages, as demonstrated in the simulations. The proposed methodology depends on deep features, followed by the dimension reduction technique for features to detect COVID-19. As a result, it produces better and more consistent performance than handcrafted features used in previous studies.

A Comparative Study of Western Singer's Voice and a Pansori Singer's Voice Based on Glottal Image and Acoustic Characteristics (성대형태 및 음향발현에서 성악 발성 및 판소리 발성의 비교 연구)

  • Kim, Sun-Sook
    • Speech Sciences
    • /
    • v.11 no.2
    • /
    • pp.165-177
    • /
    • 2004
  • Western singers voice have been studied in music science since the early 20th century. However, Korean traditional singers voice have not yet been studied scientifically. This study is to find the physiological and acoustic characteristics of Pansori singers voices. Western singers participated for comparative purposes. Ten western singers and ten Pansori singers participated in this study. The subjects spoke and sung seven simple vowels /a, e, i, o, u, c, w/. An analysis of Glottal image was done by Scope View and acoustic characteristics of speech and singing voice were analyzed by CSL. The results are as follows: (1) Glottal gestures of Pansori singers showed asymmetric vocal folds. (2) Singing vowel formants of Pansori singers showed breathiness based on Spectrogram. (3) Music formant of western singers appeared in around 3kHz area, however, Pansori singers formant appeared in low frequency area. Modulation of vibrato showed 6 frequency per sec in case of western singers. Pansori singers showed no deep modulation of vibrato on spectrogram.

  • PDF

Complexity Reduction Algorithm of Speech Coder(EVRC) for CDMA Digital Cellular System

  • Min, So-Yeon
    • Journal of Korea Multimedia Society
    • /
    • v.10 no.12
    • /
    • pp.1551-1558
    • /
    • 2007
  • The standard of evaluating function of speech coder for mobile telecommunication can be shown in channel capacity, noise immunity, encryption, complexity and encoding delay largely. This study is an algorithm to reduce complexity applying to CDMA(Code Division Multiple Access) mobile telecommunication system, which has a benefit of keeping the existing advantage of telecommunication quality and low transmission rate. This paper has an objective to reduce the computing complexity by controlling the frequency band nonuniform during the changing process of LSP(Line Spectrum Pairs) parameters from LPC(Line Predictive Coding) coefficients used for EVRC(Enhanced Variable-Rate Coder, IS-127) speech coders. Its experimental result showed that when comparing the speech coder applied by the proposed algorithm with the existing EVRC speech coder, it's decreased by 45% at average. Also, the values of LSP parameters, Synthetic speech signal and Spectrogram test result were obtained same as the existing method.

  • PDF

The Correlation between Speech Intelligibility and Acoustic Measurements in Children with Speech Sound Disorders (말소리장애 아동의 말명료도와 음향학적 측정치 간 상관관계)

  • Kang, Eunyeong
    • Journal of The Korean Society of Integrative Medicine
    • /
    • v.6 no.4
    • /
    • pp.191-206
    • /
    • 2018
  • Purpose : This study investigated the correlation between speech intelligibility and acoustic measurements of speech sounds produced by the children with speech sound disorders and children without any diagnosed speech sound disorder. Methods : A total of 60 children with and without speech sound disorders were the subjects of this study. Speech samples were obtained by having the subjects? speak meaningful words. Acoustic measurements were analyzed on a spectrogram using the Multi-speech 3700 program. Speech intelligibility was determined according to a listener's perceptual judgment. Results : Children with speech sound disorders had significantly lower speech intelligibility than those without speech sound disorders. The intensity of the vowel /u/, the duration of the vowel /${\omega}$/, and the second formant of the vowel /${\omega}$/ were significantly different between both groups. There was no difference in voice onset time between the groups. There was a correlation between acoustic measurements and speech intelligibility. Conclusion : The results of this study showed that the speech intelligibility of children with speech sound disorders was affected by intensity, word duration, and formant frequency. It is necessary to complement clinical setting results using acoustic measurements in addition to evaluation of speech intelligibility.

Speech Denoising via Low-Rank and Sparse Matrix Decomposition

  • Huang, Jianjun;Zhang, Xiongwei;Zhang, Yafei;Zou, Xia;Zeng, Li
    • ETRI Journal
    • /
    • v.36 no.1
    • /
    • pp.167-170
    • /
    • 2014
  • In this letter, we propose an unsupervised framework for speech noise reduction based on the recent development of low-rank and sparse matrix decomposition. The proposed framework directly separates the speech signal from noisy speech by decomposing the noisy speech spectrogram into three submatrices: the noise structure matrix, the clean speech structure matrix, and the residual noise matrix. Evaluations on the Noisex-92 dataset show that the proposed method achieves a signal-to-distortion ratio approximately 2.48 dB and 3.23 dB higher than that of the robust principal component analysis method and the non-negative matrix factorization method, respectively, when the input SNR is -5 dB.

Speech Analysis Tools for Text-to-Speech Synthesizer (무제한 음성합성기를 위한음성 분석 장치)

  • 김재인
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1995.06a
    • /
    • pp.115-118
    • /
    • 1995
  • 무제한 음성합성기를 구현하기 위하여 꼭 필요한 음성분석장치의 개발에 대하여 논하엿다. 이 분석장치는 신호처리 보드를 사용하여 PC에서 사용할 수 있도록 되어 있으며, 음성의 A/D, D/A 및 spectrogram display는 물론 pitch pulse 위치를 Glottal instint closure에 맞추어 삽입할 수 있어 linear prediction base의 무제한 합성기에서 필요한 음성 data base를 구축하기 용이하도록 개발하였다. 또한 음성인식을 위한 음성 DB나 현재 사용중인 ARS를 구축하고자 할 때에도 적은 노력과 시간이 소요되도록 하였다.

  • PDF