• Title/Summary/Keyword: Mel scale

Search Result 32, Processing Time 0.024 seconds

Cepstral Feature Normalization Methods Using Pole Filtering and Scale Normalization for Robust Speech Recognition (강인한 음성인식을 위한 극점 필터링 및 스케일 정규화를 이용한 켑스트럼 특징 정규화 방식)

  • Choi, Bo Kyeong;Ban, Sung Min;Kim, Hyung Soon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.34 no.4
    • /
    • pp.316-320
    • /
    • 2015
  • In this paper, the pole filtering concept is applied to the Mel-frequency cepstral coefficient (MFCC) feature vectors in the conventional cepstral mean normalization (CMN) and cepstral mean and variance normalization (CMVN) frameworks. Additionally, performance of the cepstral mean and scale normalization (CMSN), which uses scale normalization instead of variance normalization, is evaluated in speech recognition experiments in noisy environments. Because CMN and CMVN are usually performed on a per-utterance basis, in case of short utterance, they have a problem that reliable estimation of the mean and variance is not guaranteed. However, by applying the pole filtering and scale normalization techniques to the feature normalization process, this problem can be relieved. Experimental results using Aurora 2 database (DB) show that feature normalization method combining the pole-filtering and scale normalization yields the best improvements.

A Study on Stable Motion Control of Humanoid Robot with 24 Joints Based on Voice Command

  • Lee, Woo-Song;Kim, Min-Seong;Bae, Ho-Young;Jung, Yang-Keun;Jung, Young-Hwa;Shin, Gi-Soo;Park, In-Man;Han, Sung-Hyun
    • Journal of the Korean Society of Industry Convergence
    • /
    • v.21 no.1
    • /
    • pp.17-27
    • /
    • 2018
  • We propose a new approach to control a biped robot motion based on iterative learning of voice command for the implementation of smart factory. The real-time processing of speech signal is very important for high-speed and precise automatic voice recognition technology. Recently, voice recognition is being used for intelligent robot control, artificial life, wireless communication and IoT application. In order to extract valuable information from the speech signal, make decisions on the process, and obtain results, the data needs to be manipulated and analyzed. Basic method used for extracting the features of the voice signal is to find the Mel frequency cepstral coefficients. Mel-frequency cepstral coefficients are the coefficients that collectively represent the short-term power spectrum of a sound, based on a linear cosine transform of a log power spectrum on a nonlinear mel scale of frequency. The reliability of voice command to control of the biped robot's motion is illustrated by computer simulation and experiment for biped walking robot with 24 joint.

Pseudo-Cepstral Representation of Speech Signal and Its Application to Speech Recognition (음성 신호의 의사 켑스트럼 표현 및 음성 인식에의 응용)

  • Kim, Hong-Kook;Lee, Hwang-Soo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.13 no.1E
    • /
    • pp.71-81
    • /
    • 1994
  • In this paper, we propose a pseudo-cepstral representation of line spectrum pair(LSP) frequencies and evaluate speech recognition performance with cepstral lift using the pseudo-cepstrum. The pseudo-cepstrum corresponding to LSP frequencies is derived by approxmating the relationship between LPC-cepstrum and LSP frequencies. Three cepstral liftering procedures are applied to the pseudo-cepstrum to improve the performance of speech recognition. They are the root-power-sums ligter, the general exponential lifter, and the bandpass lifter. Then, the liftered psedudo-cepstra are warped into a mel-frequency scale to obtain feature vectors for speech recognition. Among the three lifters, the general exponential lifter results in the best performance on speech recognition. When we use the proposed pseudo-cepstra feature vectors for recognizing noisy speech, the signal-to-noise ratio (SNR) improvement of about 5~10dB LSP is obtained.

  • PDF

Driver Verification System Using Biometrical GMM Supervector Kernel (생체기반 GMM Supervector Kernel을 이용한 운전자검증 기술)

  • Kim, Hyoung-Gook
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.9 no.3
    • /
    • pp.67-72
    • /
    • 2010
  • This paper presents biometrical driver verification system in car experiment through analysis of speech, and face information. We have used Mel-scale Frequency Cesptral Coefficients (MFCCs) for speaker verification using speech information. For face verification, face region is detected by AdaBoost algorithm and dimension-reduced feature vector is extracted by using principal component analysis only from face region. In this paper, we apply the extracted speech- and face feature vectors to an SVM kernel with Gaussian Mixture Models(GMM) supervector. The experimental results of the proposed approach show a clear improvement compared to a simple GMM or SVM approach.

Characteristics and Drug Release Profiles of Multilamellar Vesicle(MLV) and Microemulsified Liposome(MEL) Entrapped 5-Fluorouracil and Its derivatives (5-Fluorouracil과 그 유도체를 봉입한 Multilamellar Vesicle(MLV)과 Microemulsified Liposome(MEL)의 특성 및 약물방출 거동)

  • Jee, Ung-Kil;Park, Mok-Soon;Lee, Gye-Won;Lyu, Yeon-Geun
    • Journal of Pharmaceutical Investigation
    • /
    • v.25 no.3
    • /
    • pp.249-264
    • /
    • 1995
  • Although liposome has many advantages as a pharmaceutical dosage form, its application in the industrial field has been limited because of some problems such as preparation method, reproducibility, scale-up, stability and sterilization etc. Liposomes prepared by microemulsification method had defined size, narrow size distribution, reproducibility and high entrapment efficiency. For enhancing the stability, the dry form of liposome was recommended. These types of liposome are proliposome and freeze-dried liposome. The liposome must have some properties for preparing of freeze-dried liposome; small size $(50{\sim}200\;nm)$, narrow size distribution and cryoprotectant. In this experiment, the liposomes containing 5-Fluorouracil(5-FU) and its prodrug(pentyl-5-FU-1-acetate; PFA, hexyl-5-FU-1-acetate; HFA) were made with soybean phosphatidylcholine, cholesterol, stearylamine(SA) and dicetyl phosphate(DCP) employing hydration method or microemulsification method using $Microfluidizer^{TM}$. Both or liposome types were MLV and MEL. After preparation, freeze drying and rehydration were performed. In the process of freezing, trehalose(Tr) was added as a cryoprotectant. Their evaluation methods were as follows; entrapment efficiency, mean particle size and size distribution, dissolution test, retain of entrapment efficiency and turbidity after freeze-drying. The results are summarized as belows. The entrapment efficiency of 5-FU was dependent on total lipid concentration and cholesterol content but that of PFA and HFA was decreased when cholesterol was added. When DCP and SA were added, entrapment efficiency was decreased. As the partition coefficient of drug was increased, entrapment efficiency was increased. Under the same condition, entrapment efficiency of MEL is similar to that of MLV. The mean particle size and size distribution of MEL were smaller than those of MLV. Dissolution rates of drug from both liposome types were comparatively similar. Dissolution rates of drugs with serum and liver homogenate were faster than without these material. After preparation of liposome, free drug was removed efficiency by Dowex 50W-X4. When liposome was freeze-dried and then rehydrated in the presence of Tr, characteristics of liposome were maintained well in MEL than MLV. Tr Was used successfully as a cryoprotectant in the process of freeze drying and the optimal ratio of Tr:Lipid was 4:1(g/g).

  • PDF

Endpoint Detection of Speech Signal Using Lyapunov Exponent (리아프노프 지수를 이용한 음성신호 종점 탐색 방법)

  • Zang, Xian;Kim, Jeong-Yeon;Chong, Kil-To
    • Journal of the Institute of Electronics Engineers of Korea SC
    • /
    • v.46 no.1
    • /
    • pp.28-33
    • /
    • 2009
  • In the research of speech recognition, locating the beginning and end of a speech utterance in a background of noise is of great importance. The conventional methods for speech endpoint detection are based on two simple time-domain measurements-short-time energy, and short-time zero-crossing rate, which couldn't guarantee the precise results if in the low signal-to-noise ratio environments. This paper proposes a novel approach that finds the Lyapunov exponent of time-domain waveform. This proposed method has no use for obtaining the frequency-domain parameters for endpoint detection process, e.g. Mel-Scale Features, which have been introduced in other paper. Accordingly, this algorithm is low complexity and suitable for Digital Isolated Word Recognition System.

Performance comparison evaluation of speech enhancement using various loss functions (다양한 손실 함수를 이용한 음성 향상 성능 비교 평가)

  • Hwang, Seo-Rim;Byun, Joon;Park, Young-Cheol
    • The Journal of the Acoustical Society of Korea
    • /
    • v.40 no.2
    • /
    • pp.176-182
    • /
    • 2021
  • This paper evaluates and compares the performance of the Deep Nerual Network (DNN)-based speech enhancement models according to various loss functions. We used a complex network that can consider the phase information of speech as a baseline model. As the loss function, we consider two types of basic loss functions; the Mean Squared Error (MSE) and the Scale-Invariant Source-to-Noise Ratio (SI-SNR), and two types of perceptual-based loss functions, including the Perceptual Metric for Speech Quality Evaluation (PMSQE) and the Log Mel Spectra (LMS). The performance comparison was performed through objective evaluation and listening tests with outputs obtained using various combinations of the loss functions. Test results show that when a perceptual-based loss function was combined with MSE or SI-SNR, the overall performance is improved, and the perceptual-based loss functions, even exhibiting lower objective scores showed better performance in the listening test.

Single-Channel Speech Separation Using the Time-Frequency Smoothed Soft Mask Filter (시간-주파수 스무딩이 적용된 소프트 마스크 필터를 이용한 단일 채널 음성 분리)

  • Lee, Yun-Kyung;Kwon, Oh-Wook
    • MALSORI
    • /
    • no.67
    • /
    • pp.195-216
    • /
    • 2008
  • This paper addresses the problem of single-channel speech separation to extract the speech signal uttered by the speaker of interest from a mixture of speech signals. We propose to apply time-frequency smoothing to the existing statistical single-channel speech separation algorithms: The soft mask and the minimum-mean-square-error (MMSE) algorithms. In the proposed method, we use the two smoothing later. One is the uniform mask filter whose filter length is uniform at the time-Sequency domain, and the other is the met-scale filter whose filter length is met-scaled at the time domain. In our speech separation experiments, the uniform mask filter improves speaker-to-interference ratio (SIR) by 2.1dB and 1dB for the soft mask algorithm and the MMSE algorithm, respectively, whereas the mel-scale filter achieves 1.1dB and 0.8dB for the same algorithms.

  • PDF

Performance Improvement of Speaker Recognition Using Enhanced Feature Extraction in Glottal Flow Signals and Multiple Feature Parameter Combination (Glottal flow 신호에서의 향상된 특징추출 및 다중 특징파라미터 결합을 통한 화자인식 성능 향상)

  • Kang, Jihoon;Kim, Youngil;Jeong, Sangbae
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.19 no.12
    • /
    • pp.2792-2799
    • /
    • 2015
  • In this paper, we utilize source mel-frequency cepstral coefficients (SMFCCs), skewness, and kurtosis extracted in glottal flow signals to improve speaker recognition performance. Generally, because the high band magnitude response of glottal flow signals is somewhat flat, the SMFCCs are extracted using the response below the predefined cutoff frequency. The extracted SMFCC, skewness, and kurtosis are concatenated with conventional feature parameters. Then, dimensional reduction by the principal component analysis (PCA) and the linear discriminat analysis (LDA) is followed to compare performances with conventional systems under equivalent conditions. The proposed recognition system outperformed the conventional system for large scale speaker recognition experiments. Especially, the performance improvement was more noticeable for small Gaussan mixtures.

Hierarchical Flow-Based Anomaly Detection Model for Motor Gearbox Defect Detection

  • Younghwa Lee;Il-Sik Chang;Suseong Oh;Youngjin Nam;Youngteuk Chae;Geonyoung Choi;Gooman Park
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.6
    • /
    • pp.1516-1529
    • /
    • 2023
  • In this paper, a motor gearbox fault-detection system based on a hierarchical flow-based model is proposed. The proposed system is used for the anomaly detection of a motion sound-based actuator module. The proposed flow-based model, which is a generative model, learns by directly modeling a data distribution function. As the objective function is the maximum likelihood value of the input data, the training is stable and simple to use for anomaly detection. The operation sound of a car's side-view mirror motor is converted into a Mel-spectrogram image, consisting of a folding signal and an unfolding signal, and used as training data in this experiment. The proposed system is composed of an encoder and a decoder. The data extracted from the layer of the pretrained feature extractor are used as the decoder input data in the encoder. This information is used in the decoder by performing an interlayer cross-scale convolution operation. The experimental results indicate that the context information of various dimensions extracted from the interlayer hierarchical data improves the defect detection accuracy. This paper is notable because it uses acoustic data and a normalizing flow model to detect outliers based on the features of experimental data.