• Title/Summary/Keyword: spectrogram

Search Result 236, Processing Time 0.029 seconds

Characteristics of Vocalizations of Laying Hen Related with Space in Battery Cage (케이지 내 사육 공간의 차이에 따른 산란계의 음성 특성)

  • Son, Seung-Hun;Shin, Ji-Hye;Kim, Min-Jin;Kang, Jeong-Hoon;Rhim, Shin-Jae;Paik, In-Kee
    • Journal of Animal Science and Technology
    • /
    • v.51 no.5
    • /
    • pp.421-426
    • /
    • 2009
  • This study was conducted to clarify the characteristics of vocalization of laying hen related with space in battery cage. The size of cages were classified into control (0.30 m ${\times}$ 0.14 m ${\times}$ 0.55 m, length ${\times}$ width ${\times}$ height), small (0.21 m ${\times}$ 0.14 m ${\times}$ 0.55 m) and large (0.30 m ${\times}$ 0.30 m ${\times}$ 0.55 m) size. Vocalization of 16 individuals of laying hen in each group of Hy-Line Brown (80 week old) were recorded 3 hours per day (10:00am~11:00am, 3:00pm~4:00pm and 7:00pm~8:00pm) using digital recorder and microphone during October 2008 and February 2009. Characteristics of frequency, intensity and duration of vocalization were analyzed by GLM (general linear model) and Duncan's multi-test. There were differences in basic and maximum frequency, and intensity based on analysis of spectrogram and spectrum among different cage sizes. Vocalization of laying hen would be one of the indicators to understand the stress caused by rearing space in batter cage.

A Study on Fuzziness Parameter Selection in Fuzzy Vector Quantization for High Quality Speech Synthesis (고음질의 음성합성을 위한 퍼지벡터양자화의 퍼지니스 파라메타선정에 관한 연구)

  • 이진이
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.8 no.2
    • /
    • pp.60-69
    • /
    • 1998
  • This paper proposes a speech synthesis method using Fuzzy VQ, and then study how to make choice of fuzziness value which optimizes (controls) the performance of FVQ in order to obtain the synthesized speech which is closer to the original speech. When FVQ is used to synthesize a speech, analysis stage generates membership function values which represents the degree to which an input speech pattern matches each speech patterns in codebook, and synthesis stage reproduces a synthesized speech, using membership function values which is obtained in analysis stage, fuzziness value, and fuzzy-c-means operation. By comparsion of the performance of the FVQ and VQ synthesizer with simmulation, we show that, although the FVQ codebook size is half of a VQ codebook size, the performance of FVQ is almost equal to that of VQ. This results imply that, when Fuzzy VQ is used to obtain the same performance with that of VQ in speech synthesis, we can reduce by half of memory size at a codebook storage. And then we have found that, for the optimized FVQ with maximum SQNR in synthesized speech, the fuzziness value should be small when the variance of analysis frame is relatively large, while fuzziness value should be large, when it is small. As a results of comparsion of the speeches synthesized by VQ and FVQ in their spectrogram of frequency domain, we have found that spectrum bands(formant frequency and pitch frequency) of FVQ synthesized speech are closer to the original speech than those using VQ.

  • PDF

Design and Implementation of CW Radar-based Human Activity Recognition System (CW 레이다 기반 사람 행동 인식 시스템 설계 및 구현)

  • Nam, Jeonghee;Kang, Chaeyoung;Kook, Jeongyeon;Jung, Yunho
    • Journal of Advanced Navigation Technology
    • /
    • v.25 no.5
    • /
    • pp.426-432
    • /
    • 2021
  • Continuous wave (CW) Doppler radar has the advantage of being able to solve the privacy problem unlike camera and obtains signals in a non-contact manner. Therefore, this paper proposes a human activity recognition (HAR) system using CW Doppler radar, and presents the hardware design and implementation results for acceleration. CW Doppler radar measures signals for continuous operation of human. In order to obtain a single motion spectrogram from continuous signals, an algorithm for counting the number of movements is proposed. In addition, in order to minimize the computational complexity and memory usage, binarized neural network (BNN) was used to classify human motions, and the accuracy of 94% was shown. To accelerate the complex operations of BNN, the FPGA-based BNN accelerator was designed and implemented. The proposed HAR system was implemented using 7,673 logics, 12,105 registers, 10,211 combinational ALUTs, and 18.7 Kb of block memory. As a result of performance evaluation, the operation speed was improved by 99.97% compared to the software implementation.

Acoustic features of diphthongs produced by children with speech sound disorders (말소리장애 아동이 산출한 이중모음의 음향학적 특성)

  • Cho, Yoon Soo;Pyo, Hwa Young;Han, Jin Soon;Lee, Eun Ju
    • Phonetics and Speech Sciences
    • /
    • v.13 no.1
    • /
    • pp.65-72
    • /
    • 2021
  • The aim of this study is to prepare basic data that can be used for evaluation and intervention by investigating the characteristics of diphthongs produced by children with speech sound disorders. To confirm this, two groups of 10 children each, with and without speech sound disorders were asked to imitate the meaningless two-syllable 'diphthongs + da'. The slope of F1 and F2, amount of change of formant, and duration of glide were analyzed by Praat (version 6.1.16). As a result, the difference between the two groups was found in the slope of F1 of /ju/. Children with speech sound disorders had smaller changes in formants and shorter duration time values compared to normal children, and there were statistically significant differences. The amount of change in formant in the glide was found in F1 of /ju, jɛ/, F2 of /jɑ, jɛ/, and there were significant differences in the duration of glide in /ju, jɛ/. The results of this study showed that the range of articulation of diphthongs in children with speech sound disorders is relatively smaller than that of normal children, thus the time it takes to articulate was reduced. These results suggest that the range of articulation and acoustic analysis should be further investigated for evaluation and intervention regarding diphthongs of children with speech sound disorders.

A Multi-speaker Speech Synthesis System Using X-vector (x-vector를 이용한 다화자 음성합성 시스템)

  • Jo, Min Su;Kwon, Chul Hong
    • The Journal of the Convergence on Culture Technology
    • /
    • v.7 no.4
    • /
    • pp.675-681
    • /
    • 2021
  • With the recent growth of the AI speaker market, the demand for speech synthesis technology that enables natural conversation with users is increasing. Therefore, there is a need for a multi-speaker speech synthesis system that can generate voices of various tones. In order to synthesize natural speech, it is required to train with a large-capacity. high-quality speech DB. However, it is very difficult in terms of recording time and cost to collect a high-quality, large-capacity speech database uttered by many speakers. Therefore, it is necessary to train the speech synthesis system using the speech DB of a very large number of speakers with a small amount of training data for each speaker, and a technique for naturally expressing the tone and rhyme of multiple speakers is required. In this paper, we propose a technology for constructing a speaker encoder by applying the deep learning-based x-vector technique used in speaker recognition technology, and synthesizing a new speaker's tone with a small amount of data through the speaker encoder. In the multi-speaker speech synthesis system, the module for synthesizing mel-spectrogram from input text is composed of Tacotron2, and the vocoder generating synthesized speech consists of WaveNet with mixture of logistic distributions applied. The x-vector extracted from the trained speaker embedding neural networks is added to Tacotron2 as an input to express the desired speaker's tone.

Comparative study of data augmentation methods for fake audio detection (음성위조 탐지에 있어서 데이터 증강 기법의 성능에 관한 비교 연구)

  • KwanYeol Park;Il-Youp Kwak
    • The Korean Journal of Applied Statistics
    • /
    • v.36 no.2
    • /
    • pp.101-114
    • /
    • 2023
  • The data augmentation technique is effectively used to solve the problem of overfitting the model by allowing the training dataset to be viewed from various perspectives. In addition to image augmentation techniques such as rotation, cropping, horizontal flip, and vertical flip, occlusion-based data augmentation methods such as Cutmix and Cutout have been proposed. For models based on speech data, it is possible to use an occlusion-based data-based augmentation technique after converting a 1D speech signal into a 2D spectrogram. In particular, SpecAugment is an occlusion-based augmentation technique for speech spectrograms. In this study, we intend to compare and study data augmentation techniques that can be used in the problem of false-voice detection. Using data from the ASVspoof2017 and ASVspoof2019 competitions held to detect fake audio, a dataset applied with Cutout, Cutmix, and SpecAugment, an occlusion-based data augmentation method, was trained through an LCNN model. All three augmentation techniques, Cutout, Cutmix, and SpecAugment, generally improved the performance of the model. In ASVspoof2017, Cutmix, in ASVspoof2019 LA, Mixup, and in ASVspoof2019 PA, SpecAugment showed the best performance. In addition, increasing the number of masks for SpecAugment helps to improve performance. In conclusion, it is understood that the appropriate augmentation technique differs depending on the situation and data.