• Title/Summary/Keyword: Spectrogram

Search Result 236, Processing Time 0.029 seconds

Visual.Auditory.Acoustic Study on Singing Vowels of Korean Lyric Songs (시각과 청각 및 음향적 관점에서의 노랫말 모음 연구)

  • Lee Jai Kang
    • Proceedings of the KSPS conference
    • /
    • 1996.10a
    • /
    • pp.362-366
    • /
    • 1996
  • This paper is generally divided in 2 parts. One is the study on vowels about korean singer's lyric song in view of Daniel Jones' Cardinal Vowel. The other is acoustic study on vowels in my singing about korean lyric song. Analysis data are KBS concert video tape and CSL's. NSP file on my singing and Informants are famous singers i.e. 3 sopranos, 1 mezzo, 2 tenors, 1baritone, and me. Analysis aim is to find out Korean 8 vowels([equation omitted]) quality in singing. The methods of descrition are used in closed vowels, half closed vowels, half open vowels, open vowels and rounded vowels, unroundes vowels and formants. The study of the former is while watching the monitor screen to stop the scene that is to be analysixed. The study of the latter is to analysis the spectrogram converted by CSL's. SP file. Analysis results are an follows: Visual and auditory korean vowels quality in singing have the 3 tendency. One is the tendency of more rounded than is usual Korean vowels. Another is the tendency of centralized to center point in Cardinal Vowel and the other is the tendency of diversity in vowel quality. Acoustic analysis is studied by means of 4 formants. Fl and F2 show similiar step in spoken. In Fl there is the same formant values. This seems to vocal organization be perceived the singign situation. The width of F3 is the widest of all, so F3 may be the characteristics in singing. In conclude, the characteristics of vowels in Korean lyric songs are seems to have the tendencies of rounding, centralizing to center point in Cardinal Vowel, diversity in vowel quality and, F3'widest width in compared with usual Korean vowels.

  • PDF

The Utility of Perturbation, Non-linear dynamic, and Cepstrum measures of dysphonia according to Signal Typing (음성 신호 분류에 따른 장애 음성의 변동률 분석, 비선형 동적 분석, 캡스트럼 분석의 유용성)

  • Choi, Seong Hee;Choi, Chul-Hee
    • Phonetics and Speech Sciences
    • /
    • v.6 no.3
    • /
    • pp.63-72
    • /
    • 2014
  • The current study assessed the utility of acoustic analyses the most commonly used in routine clinical voice assessment including perturbation, nonlinear dynamic analysis, and Spectral/Cepstrum analysis based on signal typing of dysphonic voices and investigated their applicability of clinical acoustic analysis methods. A total of 70 dysphonic voice samples were classified with signal typing using narrowband spectrogram. Traditional parameters of %jitter, %shimmer, and signal-to-noise ratio were calculated for the signals using TF32 and correlation dimension(D2) of nonlinear dynamic parameter and spectral/cepstral measures including mean CPP, CPP_sd, CPPf0, CPPf0_sd, L/H ratio, and L/H ratio_sd were also calculated with ADSV(Analysis of Dysphonia in Speech and VoiceTM). Auditory perceptual analysis was performed by two blinded speech-language pathologists with GRBAS. The results showed that nearly periodic Type 1 signals were all functional dysphonia and Type 4 signals were comprised of neurogenic and organic voice disorders. Only Type 1 voice signals were reliable for perturbation analysis in this study. Significant signal typing-related differences were found in all acoustic and auditory-perceptual measures. SNR, CPP, L/H ratio values for Type 4 were significantly lower than those of other voice signals and significant higher %jitter, %shimmer were observed in Type 4 voice signals(p<.001). Additionally, with increase of signal type, D2 values significantly increased and more complex and nonlinear patterns were represented. Nevertheless, voice signals with highly noise component associated with breathiness were not able to obtain D2. In particular, CPP, was highly sensitive with voice quality 'G', 'R', 'B' than any other acoustic measures. Thus, Spectral and cepstral analyses may be applied for more severe dysphonic voices such as Type 4 signals and CPP can be more accurate and predictive acoustic marker in measuring voice quality and severity in dysphonia.

Performance comparison of wake-up-word detection on mobile devices using various convolutional neural networks (다양한 합성곱 신경망 방식을 이용한 모바일 기기를 위한 시작 단어 검출의 성능 비교)

  • Kim, Sanghong;Lee, Bowon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.39 no.5
    • /
    • pp.454-460
    • /
    • 2020
  • Artificial intelligence assistants that provide speech recognition operate through cloud-based voice recognition with high accuracy. In cloud-based speech recognition, Wake-Up-Word (WUW) detection plays an important role in activating devices on standby. In this paper, we compare the performance of Convolutional Neural Network (CNN)-based WUW detection models for mobile devices by using Google's speech commands dataset, using the spectrogram and mel-frequency cepstral coefficient features as inputs. The CNN models used in this paper are multi-layer perceptron, general convolutional neural network, VGG16, VGG19, ResNet50, ResNet101, ResNet152, MobileNet. We also propose network that reduces the model size to 1/25 while maintaining the performance of MobileNet is also proposed.

General Patterns in Echolocation Call of Greater Horseshoe Bat Rhinolophus ferrumequinum, Japanese Pipistrelle Bat Pipistrellus abramus and Large-Footed Bat Myotis macrodactylus in Korea (한국에 서식하는 곤박쥐 Rhinolophus ferrumequinum, 집박쥐 Pipistrellus abramus, 큰발윗수염박쥐 Myotis macrodactylus의 반향정위 형태)

  • Chung, Chul-Un;Han, Sang-Hoon;Lim, Chun-Woo;Kim, Sung-Chul;Lee, Hwa-Jin;Kwon, Yong-Ho;Kim, Chul-Young;Lee, Chong-Il
    • Journal of Environmental Science International
    • /
    • v.19 no.1
    • /
    • pp.61-68
    • /
    • 2010
  • In this study, we analyzed the pulse-duration, pulse-interval and peak-frequency of echolocation call in three species as Rhinolophus ferrumequinum, Pipistrellus abramus, and Myotis macrodactylus. The peak frequency and pulse duration for above mentioned species were 69 kHz, 47 kHz and 49 kHz and $69.39{\pm}8.76\;ms$, $4.95{\pm}0.77\;ms$ and $3.09{\pm}0.48\;ms$ for R. ferrumequinum, P. abramus and M. macrodactylus, respectively. The pulse intervals for R. ferrumequinum, P. abramus and M. macrodactylus were $103.61{\pm}9.05\;ms$, $67.59{\pm}3.47\;ms$ and $66.35{\pm}4.96\;ms$, respectively. The pulse pattern of R. ferrumequinum was setting into a short FM call and linked to long CF call and went through the short FM call again. The pulse pattern of M. macrodactylus was comprised with serial short FM call and the CF call was not checked up in accordance with the spectrogram analysis. The long FM call and short CF call got join together for the P. abramus and the peak frequency was checked up at the pulse ending as CF call.

An Alteration Rule of Formant Transition for Improvement of Korean Demisyllable Based Synthesis by Rule (한국어 반음절단위 규칙합성의 개선을 위한 포만트천이의 변경규칙)

  • Lee, Ki-Young;Choi, Chang-Seok
    • The Journal of the Acoustical Society of Korea
    • /
    • v.15 no.4
    • /
    • pp.98-104
    • /
    • 1996
  • This paper propose the alteraton rule to compensate a formant trasition of several connected vowels for improving an unnatural synthesized continuous speech which is concatenated by each demisyllable without coarticulated formant transition for use in dmisyllable based synthesis by rule. To fullfill each formant transition part, the database of 42 stationary vowels which are segmented from the stable part of each vowels is appended to the one of Korean demisyllables, and the resonance circuit used in formant synthesis is employed to change the formant frequency of speech signals. To evaluate the synthesied speech by this rule, we carried out the alteration rule for connected vowels of the synthesized speech based on demisyllable, and compare spectrogram and MOS tested scores with the original and the demisyllable based synthesized speech without this rule. The result shows that this proposed rule can synthesize the more natural speech.

  • PDF

Tempo-oriented music recommendation system based on human activity recognition using accelerometer and gyroscope data (가속도계와 자이로스코프 데이터를 사용한 인간 행동 인식 기반의 템포 지향 음악 추천 시스템)

  • Shin, Seung-Su;Lee, Gi Yong;Kim, Hyoung-Gook
    • The Journal of the Acoustical Society of Korea
    • /
    • v.39 no.4
    • /
    • pp.286-291
    • /
    • 2020
  • In this paper, we propose a system that recommends music through tempo-oriented music classification and sensor-based human activity recognition. The proposed method indexes music files using tempo-oriented music classification and recommends suitable music according to the recognized user's activity. For accurate music classification, a dynamic classification based on a modulation spectrum and a sequence classification based on a Mel-spectrogram are used in combination. In addition, simple accelerometer and gyroscope sensor data of the smartphone are applied to deep spiking neural networks to improve activity recognition performance. Finally, music recommendation is performed through a mapping table considering the relationship between the recognized activity and the indexed music file. The experimental results show that the proposed system is suitable for use in any practical mobile device with a music player.

Characteristics of Estrus-related Vocalizations of Sows after Artificial Insemination (모돈의 인공수정 후 시기별 발성음의 특성)

  • Rhim, Shin-Jae;Kim, Min-Jin;Lee, Ju-Young;Kim, Na Ra;Kang, Jeong-Hoon
    • Journal of Animal Science and Technology
    • /
    • v.50 no.3
    • /
    • pp.401-406
    • /
    • 2008
  • This study was conducted to clarify the characteristics of estrus-related vocalization of sows after artificial insemination. Vocalization of sows in artificial insemination day, and 3 days and 50 days after artificial insemination, were recorded 3 hours per day from September 2006 to March 2007 using the MD Recorder(Marantz PMD-650) and microphone(RF Condesner MIC, MKH 416P48). The shapes of spectrum and spectrogram of vocalization were different in each period after artificial insemination. There were significant differences in frequency and intensity, but not in duration of vocalization. The fact that signal may give a reliable indication of the signaller's needs has suggested that in some circumstances they can provide information on animal welfare.

The f0 distribution of Korean speakers in a spontaneous speech corpus

  • Yang, Byunggon
    • Phonetics and Speech Sciences
    • /
    • v.13 no.3
    • /
    • pp.31-37
    • /
    • 2021
  • The fundamental frequency, or f0, is an important acoustic measure in the prosody of human speech. The current study examined the f0 distribution of a corpus of spontaneous speech in order to provide normative data for Korean speakers. The corpus consists of 40 speakers talking freely about their daily activities and their personal views. Praat scripts were created to collect f0 values, and a majority of obvious errors were corrected manually by watching and listening to the f0 contour on a narrow-band spectrogram. Statistical analyses of the f0 distribution were conducted using R. The results showed that the f0 values of all the Korean speakers were right-skewed, with a pointy distribution. The speakers produced spontaneous speech within a frequency range of 274 Hz (from 65 Hz to 339 Hz), excluding statistical outliers. The mode of the total f0 data was 102 Hz. The female f0 range, with a bimodal distribution, appeared wider than that of the male group. Regression analyses based on age and f0 values yielded negligible R-squared values. As the mode of an individual speaker could be predicted from the median, either the median or mode could serve as a good reference for the individual f0 range. Finally, an analysis of the continuous f0 points of intonational phrases revealed that the initial and final segments of the phrases yielded several f0 measurement errors. From these results, we conclude that an examination of a spontaneous speech corpus can provide linguists with useful measures to generalize acoustic properties of f0 variability in a language by an individual or groups. Further studies would be desirable of the use of statistical measures to secure reliable f0 values of individual speakers.

A study on the target detection method of the continuous-wave active sonar in reverberation based on beamspace-domain multichannel nonnegative matrix factorization (빔공간 다채널 비음수 행렬 분해에 기초한 잔향에서의 지속파 능동 소나 표적 탐지 기법에 대한 연구)

  • Lee, Seokjin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.37 no.6
    • /
    • pp.489-498
    • /
    • 2018
  • In this paper, a target detection method based on beamspace-domain multichannel nonnegative matrix factorization is studied when an echo of continuous-wave ping is received from a low-Doppler target in reverberant environment. If the receiver of the continuous-wave active sonar moves, the frequency range of the reverberation is broadened due to the Doppler effect, so the low-Doppler target echo is interfered by the reverberation in this case. The developed algorithm analyzes the multichannel spectrogram of the received signal into frequency bases, time bases, and beamformer gains using the beamspace-domain multichannel nonnnegative matrix factorization, then the algorithm estimates the frequency, time, and bearing of target echo by choosing a proper basis. To analyze the performance of the developed algorithm, simulations were performed in various signal-to-reverberation conditions. The results show that the proposed algorithm can estimate the frequency, time, and bearing, but the performance was degraded in the low signal-to-reverberation condition. It is expected that modifying the selection algorithm of the target echo basis can enhance the performance according to the simulation results.

An Interdisciplinary Study of A Leaders' Voice Characteristics: Acoustical Analysis and Members' Cognition

  • Hahm, SangWoo;Park, Hyungwoo
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.12
    • /
    • pp.4849-4865
    • /
    • 2020
  • The traditional roles of leaders are to influence members and motivate them to achieve shared goals in organizations. However, leaders such as top managers and chief executive officers, in practice, do not always directly meet or influence other company members. In fact, they tend to have the greatest impact on their members through formal speeches, company procedures, and the like. As such, official speech is directly related to the motivation of company employees. In an official speech, not only the contents of the speech, but also the voice characteristics of the speaker have an important influence on listeners, as the different vocal characteristics of a person can have different effects on the listener. Therefore, according to the voice characteristics of a leader, the cognition of the members may change, and, the degree to which the members are influenced and motivated will be different. This study identifies how members may perceive a speech differently according to the different voice characteristics of leaders in formal speeches. Further, different perceptions about voices will influence members' cognition of the leader, for example, in how trustworthy they appear. The study analyzed recorded speeches of leaders, and extracted features of their speaking style through digital speech signal analysis. Then, parameters were extracted and analyzed by the time domain, frequency domain, and spectrogram domain methods. We also analyzed the parameters for use in Natural Language Processing. We investigated which leader's voice characteristics had more influence on members or were more effective on them. A person's voice characteristics can be changed. Therefore, leaders who seek to influence members in formal speeches should have effective voice characteristics to motivate followers.