• Title/Summary/Keyword: spectrogram

Search Result 239, Processing Time 0.023 seconds

Experimental Phonetic Study of Kyungsang and Cholla Dialect Using Power Spectrum and Laryngeal Fiberscope (파워스펙트럼 및 후두내시경을 이용한 방언 음성(方言 音聲)의 실험적 연구(實驗的 硏究): 경상방언 및 전라방언을 중심으로)

  • Kim, Hyun-Gi;Lee, Eung-Young;Hong, Ki-Hwan
    • Speech Sciences
    • /
    • v.9 no.2
    • /
    • pp.25-47
    • /
    • 2002
  • Human language activity in the information society has been developing the communication system between humans and machines. The aim of this study was to analyze dialectal speech in Korea. One hundred Kyungsang and one hundred Cholla informants participated in this study. A CSL and Flexible laryngeal fiberscope were used for analysis of the acoustic and glottal gestures of all the vowels and consonants. Test words were made on the picture cards and letter cards which contained each vowel and each consonant, respectively. The dialogue between the examiner and the informants was recorded in a question and answer manner. The acoustic results of two dialects were as follows: Kyungsang and Cholla informants showed neutralization between /e/ and /$\varepsilon$. However, the apertures of Kyungsang vowels /i, w, u, o/ were higher than those of Cholla vowels. The /wi/ and /$\varepsilon$/ of Kyungsang Diphthong vowels were shown as simple vowels /i/ and /$\varepsilon$/ in Cholla dialect. The VOT of Cholla dilaect was longer than that of Kyungsang dialect. The fricative frequence of Kyurlgsang dialect was about 1000Hz higher than that of Cholla dialect. The glottal widths on fiberscopic images showed that the consonant durations of Kyungsang and Cholla dialects were correlated all together with the acoustic duration on the spectrogram.

  • PDF

Perturbation and Perceptual Analysis of Pathological Sustained Vowels according to Signal Typing

  • Lee, Ji-Yeoun;Choi, Seong-Hee;Jiang, Jack J.;Hahn, Min-Soo;Choi, Hong-Shik
    • Phonetics and Speech Sciences
    • /
    • v.2 no.2
    • /
    • pp.109-115
    • /
    • 2010
  • In this paper, we investigate a signal typing on the basis of visual impression of distinctive spectrogram. Pathological voices are classified into signal type 1, 2, 3, or 4 to estimate perturbation parameters and to mark perceptual rating based on Consensus Auditory-Perceptual Evaluation of Voice (CAPE-V). The results suggest that perturbation analysis can be applied to only type 1 and 2 signals and the perceptual ratings of overall grade increase with each signal type, overall. A good inter-rater reliability is showed among three raters. We recommend that pathological voices should be marked the signal typing and CAPE-V, together, to definitely describe the characteristics of pathological voices.

  • PDF

Determination of Arsenic in Korean human liver and manganese, copper in Vitamin prepartions by neutron action analysis (중성자(中性子) 방사화(放射化) 분석법(分析法)에 의(依)한 한국인(韓國人) 간장중(肝臟中)의 비소(砒素) 및 Vitamin제제중(製劑中)의 금속(金屬)(CU, Mn)의 정량(定量))

  • Oh, Soo-Chang
    • Journal of Pharmaceutical Investigation
    • /
    • v.4 no.4
    • /
    • pp.17-25
    • /
    • 1974
  • 1. Neutron acivation analysis of arsenic contained in Korean human liver was studied in the view point of forensic chemistry, using 12 corpses. A sample of 1g was irradiated for 30 mins. in a neutron flux of $1.2{\times}10^{12}n/cm^2/sec$, followed by nitric-sulfuric acid digestion and then by Gutzeit separation. Radio activity was detected by it's scintillation counter. The arsenic content in the liver was found to be $0.01{\mu}g/g$ to $0.15{\mu}g/g$. 2. A rapid and convenient method for the radiochemical determination of minerals by neutron activation analysis was established. After neutron irradiation to the standard soln. of Cu and Mn in pneumatic tube (neutron flux : $1.2{\times}10^{12}n/cm^2/sec$), Cu and Mn were determined by estimating the ratio of the widths under energy peak area in ${\gamma}-ray-spectrogram$. When the standard soln. of Mn and Cu is irradiated for 15 mins. to 18 hrs., recovery test shows that the relative errors are 5.1% and 4.5% for copper and manganese, respectively.

  • PDF

Comparative Analysis for General and Estrus-related Vocalizations in Sows (모돈의 일반 발성음과 발정기 특이음의 비교분석)

  • Jeon, J.H.;Yeon, S.C.;Chang, H.H.
    • Journal of Animal Science and Technology
    • /
    • v.47 no.1
    • /
    • pp.133-140
    • /
    • 2005
  • The aim of this study was to divide vocalizations of sows into general(GVs) and estrus-related vocalizations( EVs) and to find out their phonetic characteristics. Ten sows(Landrace) were recorded using digital video recorders twice daily(06: 00 - 08 : 00h and 17: 00 - 19 : 00h) during the anestrus and estrus periods. The GVs and EVs were divided based on the shapes of spectrum and spectrogram. The GVs and EVs were identified as 5 and 3 types, respectively. Pitch, formant I, formant 2, and formant 3 between GVs and EVs were not significantly different(P> 0.05), whereas intensity(P < 0.001), duration(P < 0.05), and formant 4(P < 0.01) were significantly different. Three parameter groups(Group I : Formant vector alone, Group II: Formant veetor+ parameters from time signal, Group III: Formant vector+parameters from time signal-parameters eliminated by stepwise discriminant analysis backward) were compared by discriminant function analysis. The classification system adopted in the Group II represented the higher discrimination rate than those in other groups(Group I : 76.1 0/0, Group II : 88.1 0/0, Group Ill: 87.3 %). These results suggest that EVs are present and intensity, formant 2, and formant 4 are available parameters for discrimination of EVs in sows.

Improvement of Vocal Detection Accuracy Using Convolutional Neural Networks

  • You, Shingchern D.;Liu, Chien-Hung;Lin, Jia-Wei
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.2
    • /
    • pp.729-748
    • /
    • 2021
  • Vocal detection is one of the fundamental steps in musical information retrieval. Typically, the detection process consists of feature extraction and classification steps. Recently, neural networks are shown to outperform traditional classifiers. In this paper, we report our study on how to improve detection accuracy further by carefully choosing the parameters of the deep network model. Through experiments, we conclude that a feature-classifier model is still better than an end-to-end model. The recommended model uses a spectrogram as the input plane and the classifier is an 18-layer convolutional neural network (CNN). With this arrangement, when compared with existing literature, the proposed model improves the accuracy from 91.8% to 94.1% in Jamendo dataset. As the dataset has an accuracy of more than 90%, the improvement of 2.3% is difficult and valuable. If even higher accuracy is required, the ensemble learning may be used. The recommend setting is a majority vote with seven proposed models. Doing so, the accuracy increases by about 1.1% in Jamendo dataset.

The Correlation between Speech Intelligibility and Acoustic Measurements in Children with Speech Sound Disorders (말소리장애 아동의 말명료도와 음향학적 측정치 간 상관관계)

  • Kang, Eunyeong
    • Journal of The Korean Society of Integrative Medicine
    • /
    • v.6 no.4
    • /
    • pp.191-206
    • /
    • 2018
  • Purpose : This study investigated the correlation between speech intelligibility and acoustic measurements of speech sounds produced by the children with speech sound disorders and children without any diagnosed speech sound disorder. Methods : A total of 60 children with and without speech sound disorders were the subjects of this study. Speech samples were obtained by having the subjects? speak meaningful words. Acoustic measurements were analyzed on a spectrogram using the Multi-speech 3700 program. Speech intelligibility was determined according to a listener's perceptual judgment. Results : Children with speech sound disorders had significantly lower speech intelligibility than those without speech sound disorders. The intensity of the vowel /u/, the duration of the vowel /${\omega}$/, and the second formant of the vowel /${\omega}$/ were significantly different between both groups. There was no difference in voice onset time between the groups. There was a correlation between acoustic measurements and speech intelligibility. Conclusion : The results of this study showed that the speech intelligibility of children with speech sound disorders was affected by intensity, word duration, and formant frequency. It is necessary to complement clinical setting results using acoustic measurements in addition to evaluation of speech intelligibility.

Study on data augmentation methods for deep neural network-based audio tagging (Deep neural network 기반 오디오 표식을 위한 데이터 증강 방법 연구)

  • Kim, Bum-Jun;Moon, Hyeongi;Park, Sung-Wook;Park, Young cheol
    • The Journal of the Acoustical Society of Korea
    • /
    • v.37 no.6
    • /
    • pp.475-482
    • /
    • 2018
  • In this paper, we present a study on data augmentation methods for DNN (Deep Neural Network)-based audio tagging. In this system, an audio signal is converted into a mel-spectrogram and used as an input to the DNN for audio tagging. To cope with the problem associated with a small number of training data, we augment the training samples using time stretching, pitch shifting, dynamic range compression, and block mixing. In this paper, we derive optimal parameters and combinations for the augmentation methods through audio tagging simulations.

Human Laughter Generation using Hybrid Generative Models

  • Mansouri, Nadia;Lachiri, Zied
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.5
    • /
    • pp.1590-1609
    • /
    • 2021
  • Laughter is one of the most important nonverbal sound that human generates. It is a means for expressing his emotions. The acoustic and contextual features of this specific sound are different from those of speech and many difficulties arise during their modeling process. During this work, we propose an audio laughter generation system based on unsupervised generative models: the autoencoder (AE) and its variants. This procedure is the association of three main sub-process, (1) the analysis which consist of extracting the log magnitude spectrogram from the laughter database, (2) the generative models training, (3) the synthesis stage which incorporate the involvement of an intermediate mechanism: the vocoder. To improve the synthesis quality, we suggest two hybrid models (LSTM-VAE, GRU-VAE and CNN-VAE) that combine the representation learning capacity of variational autoencoder (VAE) with the temporal modelling ability of a long short-term memory RNN (LSTM) and the CNN ability to learn invariant features. To figure out the performance of our proposed audio laughter generation process, objective evaluation (RMSE) and a perceptual audio quality test (listening test) were conducted. According to these evaluation metrics, we can show that the GRU-VAE outperforms the other VAE models.

Automatic Generation Subtitle Service with Kinetic Typography according to Music Sentimental Analysis (음악 감정 분석을 통한 키네틱 타이포그래피 자막 자동 생성 서비스)

  • Ji, Youngseo;Lee, Haram;Lim, SoonBum
    • Journal of Korea Multimedia Society
    • /
    • v.24 no.8
    • /
    • pp.1184-1191
    • /
    • 2021
  • In a pop song, the creator's intention is communicated to the user through music and lyrics. Lyric meaning is as important as music, but in most cases lyrics are delivered to users in a static form without non-verbal cues. Providing lyrics in a static text format is inefficient in conveying the emotions of a music. Recently, lyrics video with kinetic typography are increasingly provided, but producing them requires expertise and a lot of time. Therefore, in this system, the emotions of the lyrics are found through the analysis of the text of the lyrics, and the deep learning model is trained with the data obtained by converting the melody into a Mel-spectrogram format to find the appropriate emotions for the music. It sets properties such as motion, font, and color using the emotions found in the music, and automatically creates a kinetic typography video. In this study, we tried to enhance the effect of conveying the meaning of music through this system.

Acoustic Emission and Burr Comparison of Circular Sawing and Milling in Fiber Reinforced Plastic Cutting (원형 톱과 엔드밀의 복합재료 절단 음향과 버 비교연구)

  • Joo, Chang-Min;Baek, Jong-Hyun;Kim, Su-Jin;Lee, Gun-Myung
    • Journal of the Korean Society of Manufacturing Process Engineers
    • /
    • v.21 no.7
    • /
    • pp.98-104
    • /
    • 2022
  • Circular sawing and milling are general machining processes used for routing fiber-reinforced plastics (FRP). In this study, the productivity and cutting quality of a circular saw and flat endmill were compared. As a result, the productivity of the circular saw was approximately ten times higher than that of the endmill for the same tool life, and the burr size of the circular saw was 14 times smaller than that of the flat-end mill. The spectrogram analysis of the cutting sound also showed that the acoustic emission of the circular saw was more uniform than that of the flat end mill. Circular sawing is thus a more suitable process for the straight cutting of pultrusion FRP than a flat endmill.