• Title/Summary/Keyword: Mel-spectrum

Search Result 45, Processing Time 0.026 seconds

Comparison of environmental sound classification performance of convolutional neural networks according to audio preprocessing methods (오디오 전처리 방법에 따른 콘벌루션 신경망의 환경음 분류 성능 비교)

  • Oh, Wongeun
    • The Journal of the Acoustical Society of Korea
    • /
    • v.39 no.3
    • /
    • pp.143-149
    • /
    • 2020
  • This paper presents the effect of the feature extraction methods used in the audio preprocessing on the classification performance of the Convolutional Neural Networks (CNN). We extract mel spectrogram, log mel spectrogram, Mel Frequency Cepstral Coefficient (MFCC), and delta MFCC from the UrbanSound8K dataset, which is widely used in environmental sound classification studies. Then we scale the data to 3 distributions. Using the data, we test four CNNs, VGG16, and MobileNetV2 networks for performance assessment according to the audio features and scaling. The highest recognition rate is achieved when using the unscaled log mel spectrum as the audio features. Although this result is not appropriate for all audio recognition problems but is useful for classifying the environmental sounds included in the Urbansound8K.

A Study on the Spectrum Variation of Korean Speech (한국어 음성의 스펙트럼 변화에 관한 연구)

  • Lee Sou-Kil;Song Jeong-Young
    • Journal of Internet Computing and Services
    • /
    • v.6 no.6
    • /
    • pp.179-186
    • /
    • 2005
  • We can extract spectrum of the voices and analyze those, after employing features of frequency that voices have. In the spectrum of the voices monophthongs are thought to be stable, but when a consonant(s) meet a vowel(s) in a syllable or a word, there is a lot of changes. This becomes the biggest obstacle to phoneme speech recognition. In this study, using Mel Cepstrum and Mel Band that count Frequency Band and auditory information, we analyze the spectrums that each and every consonant and vowel has and the changes in the voices reftects auditory features and make it a system. Finally we are going to present the basis that can segment the voices by an unit of phoneme.

  • PDF

Vocal Tract Normalization Using The Power Spectrum Warping (파워 스펙트럼 warping을 이용한 성도 정규화)

  • Yu, Il-Su;Kim, Dong-Ju;No, Yong-Wan;Hong, Gwang-Seok
    • Proceedings of the KIEE Conference
    • /
    • 2003.11b
    • /
    • pp.215-218
    • /
    • 2003
  • The method of vocal tract normalization has been known as a successful method for improving the accuracy of speech recognition. A frequency warping procedure based low complexity and maximum likelihood has been generally applied for vocal tract normalization. In this paper, we propose a new power spectrum warping procedure that can be improve on vocal tract normalization performance than a frequency warping procedure. A mechanism for implementing this method can be simply achieved by modifying the power spectrum of filter bank in Mel-frequency cepstrum feature(MFCC) analysis. Experimental study compared our Proposal method with the well-known frequency warping method. The results have shown that the power spectrum warping is better 50% about the recognition performance than the frequency warping.

  • PDF

Construction and performance evaluation of a medium energy ion scattering spectroscopy system (중 에너지 이온산란 분광장치의 제작 및 성능 평가)

  • 김현경;문대원;김영필;이재철;강희재
    • Journal of the Korean Vacuum Society
    • /
    • v.6 no.1
    • /
    • pp.97-102
    • /
    • 1997
  • A medium energy ion scattering spectroscopy(ME1S) system has been developed and tested.In the MEIS system a toroidal electrostatic energy analyzer(TEA) and a two dimensional position sensitivedetector(PSD) were used. The energy resolution of MEIS system was estimated to be less than $4\times 10^{-3}$ and the overall angular resolution was less than 0.3". From the MEIS spectrum of $Ta_2O_5$(300 $\AA$)/ onSi analyzedousing 60 keV $H^+$, the energy loss factor(S.1 and depth resolution were estimated to he 42 eV/$\AA$ and 9.7 $\AA$, respectively. Also Si(100) surface was analyzed using the MEIS system. A random MElSspectrum was obtained from thc Si(100) covered with native oxide layers. At the double alignment condition, MElS spectrum showed ;i Si surface peak, a oxygen peak and a carbon peak.nd a carbon peak.

  • PDF

Parts-based Feature Extraction of Speech Spectrum Using Non-Negative Matrix Factorization (Non-Negative Matrix Factorization을 이용한 음성 스펙트럼의 부분 특징 추출)

  • 박정원;김창근;허강인
    • Proceedings of the IEEK Conference
    • /
    • 2003.11a
    • /
    • pp.49-52
    • /
    • 2003
  • In this paper, we propose new speech feature parameter using NMf(Non-Negative Matrix Factorization). NMF can represent multi-dimensional data based on effective dimensional reduction through matrix factorization under the non-negativity constraint, and reduced data present parts-based features of input data. In this paper, we verify about usefulness of NMF algorithm for speech feature extraction applying feature parameter that is got using NMF in Mel-scaled filter bank output. According to recognition experiment result, we could confirm that proposal feature parameter is superior in recognition performance than MFCC(mel frequency cepstral coefficient) that is used generally.

  • PDF

A Study on Robust Feature Vector Extraction for Fault Detection and Classification of Induction Motor in Noise Circumstance (잡음 환경에서의 유도 전동기 고장 검출 및 분류를 위한 강인한 특징 벡터 추출에 관한 연구)

  • Hwang, Chul-Hee;Kang, Myeong-Su;Kim, Jong-Myon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.12
    • /
    • pp.187-196
    • /
    • 2011
  • Induction motors play a vital role in aeronautical and automotive industries so that many researchers have studied on developing a fault detection and classification system of an induction motor to minimize economical damage caused by its fault. With this reason, this paper extracts robust feature vectors from the normal/abnormal vibration signals of the induction motor in noise circumstance: partial autocorrelation (PARCOR) coefficient, log spectrum powers (LSP), cepstrum coefficients mean (CCM), and mel-frequency cepstrum coefficient (MFCC). Then, we classified different types of faults of the induction motor by using the extracted feature vectors as inputs of a neural network. To find optimal feature vectors, this paper evaluated classification performance with 2 to 20 different feature vectors. Experimental results showed that five to six features were good enough to give almost 100% classification accuracy except features by CCM. Furthermore, we considered that vibration signals could include noise components caused by surroundings. Thus, we added white Gaussian noise to original vibration signals, and then evaluated classification performance. The evaluation results yielded that LSP was the most robust in noise circumstance, then PARCOR and MFCC followed by LSP, respectively.

Earthquake detection based on convolutional neural network using multi-band frequency signals (다중 주파수 대역 convolutional neural network 기반 지진 신호 검출 기법)

  • Kim, Seung-Il;Kim, Dong-Hyun;Shin, Hyun-Hak;Ku, Bonhwa;Ko, Hanseok
    • The Journal of the Acoustical Society of Korea
    • /
    • v.38 no.1
    • /
    • pp.23-29
    • /
    • 2019
  • In this paper, a deep learning-based detection and classification using multi-band frequency signals is presented for detecting earthquakes prevalent in Korea. Based on an analysis of the previous earthquakes in Korea, it is observed that multi-band signals are appropriate for classifying earthquake signals. Therefore, in this paper, we propose a deep CNN (Convolutional Neural Network) using multi-band signals as training data. The proposed algorithm extracts the multi-band signals (Low/Medium/High frequency) by applying band pass filters to mel-spectrum of earthquake signals. Then, we construct three CNN architecture pipelines for extracting features and classifying the earthquake signals by a late fusion of the three CNNs. We validate effectiveness of the proposed method by performing various experiments for classifying the domestic earthquake signals detected in 2018.

Synthesis of Borosilicate Zeotypes by Steam-assisted Conversion Method (수증기 쪼임법에 의한 제올라이트형 보로실리케이트 제조방법)

  • Mansour, R.;Lafjah, M.;Djafri, F.;Bengueddach, A.
    • Journal of the Korean Chemical Society
    • /
    • v.51 no.2
    • /
    • pp.178-185
    • /
    • 2007
  • Intermediate pentasil borosilicate zeolite-like materials have been crystallized by a novel method named steam-assisted conversion, which involves vapor-phase transport of water. Indeed, amorphous powders obtained by drying Na2O.SiO2.B2O3.TBA2O gels of various compositions using different boron sources are transformed into crystalline borosilicate zeolite belonging to pentasil family structure by contact with vapors of water under hydrothermal conditions. Using a variant of this method, a new material which has an intermediate structure of MFI/MEL in the ratio 90:10 was crystallized. The results show that steam and sufficiently high pH in the reacting hydrous solid are necessary for the crystallization to proceed. Characterization of the products shows some specific structural aspects which may have its unique catalytic properties. X-ray diffraction patterns of these microporous crystalline borosilicates are subjected to investigation, then, it is shown that the product structure has good crystallinity and is interpreted in terms of regular stacking of pentasil layers correlated by inversion centers (MFI structure) but interrupted by faults consisting of mirror-related layers (MEL structure). The products are also characterized by nitrogen adsorption at 77 K that shows higher microporous volume (0.160 cc/g) than that of pure MFI phase (0.119 cc/g). The obtained materials revealed high surface area (~600 m2/g). The infrared spectrum reveals the presence of an absorption band at 900.75 cm-1 indicating the incorporation of boron in tetrahedral sites in the silicate matrix of the crystalline phase.

Mel-Frequency Cepstral Coefficients Using Formants-Based Gaussian Distribution Filterbank (포만트 기반의 가우시안 분포를 가지는 필터뱅크를 이용한 멜-주파수 켑스트럴 계수)

  • Son, Young-Woo;Hong, Jae-Keun
    • The Journal of the Acoustical Society of Korea
    • /
    • v.25 no.8
    • /
    • pp.370-374
    • /
    • 2006
  • Mel-frequency cepstral coefficients are widely used as the feature for speech recognition. In FMCC extraction process. the spectrum. obtained by Fourier transform of input speech signal is divided by met-frequency bands, and each band energy is extracted for the each frequency band. The coefficients are extracted by the discrete cosine transform of the obtained band energy. In this Paper. we calculate the output energy for each bandpass filter by taking the weighting function when applying met-frequency scaled bandpass filter. The weighting function is Gaussian distributed function whose center is at the formant frequency In the experiments, we can see the comparative performance with the standard MFCC in clean condition. and the better Performance in worse condition by the method proposed here.

Speech/Music Discrimination Using Spectrum Analysis and Neural Network (스펙트럼 분석과 신경망을 이용한 음성/음악 분류)

  • Keum, Ji-Soo;Lim, Sung-Kil;Lee, Hyon-Soo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.26 no.5
    • /
    • pp.207-213
    • /
    • 2007
  • In this research, we propose an efficient Speech/Music discrimination method that uses spectrum analysis and neural network. The proposed method extracts the duration feature parameter(MSDF) from a spectral peak track by analyzing the spectrum, and it was used as a feature for Speech/Music discriminator combined with the MFSC. The neural network was used as a Speech/Music discriminator, and we have reformed various experiments to evaluate the proposed method according to the training pattern selection, size and neural network architecture. From the results of Speech/Music discrimination, we found performance improvement and stability according to the training pattern selection and model composition in comparison to previous method. The MSDF and MFSC are used as a feature parameter which is over 50 seconds of training pattern, a discrimination rate of 94.97% for speech and 92.38% for music. Finally, we have achieved performance improvement 1.25% for speech and 1.69% for music compares to the use of MFSC.