• Title/Summary/Keyword: Mel-spectrum

Search Result 46, Processing Time 0.028 seconds

The Effect of the Telephone Channel to the Performance of the Speaker Verification System (전화선 채널이 화자확인 시스템의 성능에 미치는 영향)

  • 조태현;김유진;이재영;정재호
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.5
    • /
    • pp.12-20
    • /
    • 1999
  • In this paper, we compared speaker verification performance of the speech data collected in clean environment and in channel environment. For the improvement of the performance of speaker verification gathered in channel, we have studied on the efficient feature parameters in channel environment and on the preprocessing. Speech DB for experiment is consisted of Korean doublet of numbers, considering the text-prompted system. Speech features including LPCC(Linear Predictive Cepstral Coefficient), MFCC(Mel Frequency Cepstral Coefficient), PLP(Perceptually Linear Prediction), LSP(Line Spectrum Pair) are analyzed. Also, the preprocessing of filtering to remove channel noise is studied. To remove or compensate for the channel effect from the extracted features, cepstral weighting, CMS(Cepstral Mean Subtraction), RASTA(RelAtive SpecTrAl) are applied. Also by presenting the speech recognition performance on each features and the processing, we compared speech recognition performance and speaker verification performance. For the evaluation of the applied speech features and processing methods, HTK(HMM Tool Kit) 2.0 is used. Giving different threshold according to male or female speaker, we compare EER(Equal Error Rate) on the clean speech data and channel data. Our simulation results show that, removing low band and high band channel noise by applying band pass filter(150~3800Hz) in preprocessing procedure, and extracting MFCC from the filtered speech, the best speaker verification performance was achieved from the view point of EER measurement.

  • PDF

Speech/Music Signal Classification Based on Spectrum Flux and MFCC For Audio Coder (오디오 부호화기를 위한 스펙트럼 변화 및 MFCC 기반 음성/음악 신호 분류)

  • Sangkil Lee;In-Sung Lee
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.16 no.5
    • /
    • pp.239-246
    • /
    • 2023
  • In this paper, we propose an open-loop algorithm to classify speech and music signals using the spectral flux parameters and Mel Frequency Cepstral Coefficients(MFCC) parameters for the audio coder. To increase responsiveness, the MFCC was used as a short-term feature parameter and spectral fluxes were used as a long-term feature parameters to improve accuracy. The overall voice/music signal classification decision is made by combining the short-term classification method and the long-term classification method. The Gaussian Mixed Model (GMM) was used for pattern recognition and the optimal GMM parameters were extracted using the Expectation Maximization (EM) algorithm. The proposed long-term and short-term combined speech/music signal classification method showed an average classification error rate of 1.5% on various audio sound sources, and improved the classification error rate by 0.9% compared to the short-term single classification method and 0.6% compared to the long-term single classification method. The proposed speech/music signal classification method was able to improve the classification error rate performance by 9.1% in percussion music signals with attacks and 5.8% in voice signals compared to the Unified Speech Audio Coding (USAC) audio classification method.

Numerical simulation of fully nonlinear sloshing waves in three-dimensional tank under random excitation

  • Xu, Gang;Hamouda, A.M.S.;Khoo, B.C.
    • Ocean Systems Engineering
    • /
    • v.1 no.4
    • /
    • pp.355-372
    • /
    • 2011
  • Based on the fully nonlinear velocity potential theory, the liquid sloshing in a three dimensional tank under random excitation is studied. The governing Laplace equation with fully nonlinear boundary conditions on the moving free surface is solved using the indirect desingularized boundary integral equation method (DBIEM). The fourth-order predictor-corrector Adams-Bashforth-Moulton scheme (ABM4) and mixed Eulerian-Lagrangian (MEL) method are used for the time-stepping integration of the free surface boundary conditions. A smoothing scheme, B-spline curve, is applied to both the longitudinal and transverse directions of the tank to eliminate the possible saw-tooth instabilities. When the tank is undergoing one dimensional regular motion of small amplitude, the calculated results are found to be in very good agreement with linear analytical solution. In the simulation, the normal standing waves, travelling waves and bores are observed. The extensive calculation has been made for the tank undergoing specified random oscillation. The nonlinear effect of random sloshing wave is studied and the effect of peak frequency used for the generation of random oscillation is investigated. It is found that, even as the peak value of spectrum for oscillation becomes smaller, the maximum wave elevation on the side wall becomes bigger when the peak frequency is closer to the natural frequency.

Radical Scavenging Activity and Cytotoxicity of Maysin(C-glycosylflavone) isolated from Silks of Zea mays L.

  • Kim, Sun-Lim;Snook, Maurice-E;Lee, Jong-Ock
    • KOREAN JOURNAL OF CROP SCIENCE
    • /
    • v.48 no.5
    • /
    • pp.392-396
    • /
    • 2003
  • Maysin, a C-glycosylflavone, was isolated from the silks of maize, Zea mays L. The ESI mass spectrum indicates that molecular weight of maysin is $577\textrm{M}^+$m/z, and the ether-linked sugar is rhamnose, $431\textrm{M}^+$m/z (MW$^{+}$-146). The DPPH (1,1-Diphenyl-2-picrylhydrazyl) radical scavenging activity of maysin was higher than that of rutin. However, as compared with its aglycon luteolin, maysin showed the relatively moderate DPPH scavenging activity mainly due to the glycosylation of two sugars moieties, keto-fucose and rhamnose. In the in vitro cytotoxicity test against the five human tumor cell lines such as lung (A549), ovarian (SK-OV-3), melanoma (SK-MEL-2), central nerve system (XF-489), and colon (HCT-15), maysin exhibited the relatively weaker activities than cisplatin. The $\textrm{ED}_{50}$ values of maysin were 62.24, 43.18, 16.83, 37.22, and 32.09/$m\ell$, respectively. Result suggests that maysin is a potential cytotoxicity compound, particularly for human colon, central nerve system, and melanoma tumors.s.

Two-Channel Noise Reduction Using Beamforming and DOA-Based Masking (빔포밍 및 DOA 기반의 마스킹을 이용한 2채널 잡음제거)

  • Kim, Youngil;Jeong, Sangbae
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.17 no.1
    • /
    • pp.32-40
    • /
    • 2013
  • In this paper, we propose a multi-channel speech enhancement algorithm using beamforming and direction-of-arrival (DOA)-based masking. The proposed algorithm enhances noisy speech basically by the linearly constrained minimum variance (LCMV) algorithm and then a mel-scale Wiener filter designed using DOA-based masking is applied to remove still remaining noises. To improve the performance, we optimize the learning rate of the adaptive filters in LCMV and the DOA threshold to detect target speech spectrum. As performance indices, the perceptual evaluation of speech quality (PESQ) score and output SNRs are measured. Experimantal results show that the proposed algorithm outperforms the conventional LCMV beamformer by 0.09 in PESQ score and 5.75 dB in output SNR, respectively.

A cable tension identification technology using percussion sound

  • Wang, Guowei;Lu, Wensheng;Yuan, Cheng;Kong, Qingzhao
    • Smart Structures and Systems
    • /
    • v.29 no.3
    • /
    • pp.475-484
    • /
    • 2022
  • The loss of cable tension for civil infrastructure reduces structural bearing capacity and causes harmful deformation of structures. Currently, most of the structural health monitoring (SHM) approaches for cables rely on contact transducers. This paper proposes a cable tension identification technology using percussion sound, which provides a fast determination of steel cable tension without physical contact between cables and sensors. Notably, inspired by the concept of tensioning strings for piano tuning, this proposed technology predicts cable tension value by deep learning assisted classification of "percussion" sound from tapping a steel cable. To simulate the non-linear mapping of human ears to sound and to better quantify the minor changes in the high-frequency bands of the sound spectrum generated by percussions, Mel-frequency cepstral coefficients (MFCCs) were extracted as acoustic features to train the deep learning network. A convolutional neural network (CNN) with four convolutional layers and two global pooling layers was employed to identify the cable tension in a certain designed range. Moreover, theoretical and finite element methods (FEM) were conducted to prove the feasibility of the proposed technology. Finally, the identification performance of the proposed technology was experimentally investigated. Overall, results show that the proposed percussion-based technology has great potentials for estimating cable tension for in-situ structural safety assessment.

Characteristics of mushroom Phellinus baumii extracts with enzyme pretreatment (효소 전처리에 의한 상황버섯 β-glucan 추출물의 특성)

  • Son, Eun Ji;Ryu, Eun-Ah;Lee, Sang-Han;Kim, Young-Chan;Hwang, In-Wook;Chung, Shin-Kyo
    • Journal of Applied Biological Chemistry
    • /
    • v.61 no.1
    • /
    • pp.101-108
    • /
    • 2018
  • This study was conducted to establish optimized ${\beta}-glucan$ extraction method through enzymatic hydrolysis from Phellinus baumii and investigate ${\beta}-glucan$ contents and physicochemical properties. The optimal condition was obtained with the enzyme concentration of 0.66% (v/v), reaction time of 6.08 h ($R^2=0.9245$) and the ${\beta}-glucan$ contents from the Phellinus baumii extracts under the optimized condition was 1.9594 g/100 g. ${\beta}-Glucan$ yield (0.76-16.40%) of enzyme beta-glucan extract (EBE) was three fold higher than that of non-enzyme beta-glucan extract (NEBE). ${\beta}-Glucan$ purity (11.15-59.05%) of non-enzyme beta-glucan (NEB) and that of enzyme beta-glucan (EB) were higher than that of NEBE and that of EBE. ${\beta}-Glucan$ purity of EB (59.05%) and ${\beta}-glucan$ contents of EB (3.38 g/100 g) showed higher than those of others. Total sugar contents (0.61-1.17 mg/mL) showed that NEB and EB were higher than that of NEBE and EBE, EB had the highest total sugar content as 1.17 mg/mL, respectively. Protein contents (0.44-11.73 mg/mL) of NEBE and that of EBE were higher than that of NEB, that of EB. In FT-IR spectrum, the band at $890cm^{-1}$ of microcapsule was attributed to a ${\beta}-1,3-glucan$. The toxicities of ${\beta}-glucan$ from Phellinus baumii in both melanoma cell lines was determined using 3-(4,5-dimethylthiazol-2-yl)-2,5-diphenyltetrazoli um bromide assay and ${\beta}-glucan$ from Phellinus baumii has no toxicity until $30{\mu}g/mL$. The effects of ${\beta}-glucan$ from Phellinus baumii on inhibition of cancer cell proliferation were detected by using a wound healing assay. The effect of NEB and EB were higher than NEBE and EBE, especially $30{\mu}g/mL$ of EB had the highest in both melanoma cell lines.

Frame Reliability Weighting for Robust Speech Recognition (프레임 신뢰도 가중에 의한 강인한 음성인식)

  • 조훈영;김락용;오영환
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.3
    • /
    • pp.323-329
    • /
    • 2002
  • This paper proposes a frame reliability weighting method to compensate for a time-selective noise that occurs at random positions of speech signal contaminating certain parts of the speech signal. Speech frames have different degrees of reliability and the reliability is proportional to SNR (signal-to noise ratio). While it is feasible to estimate frame Sl? by using the noise information from non-speech interval under a stationary noisy situation, it is difficult to obtain noise spectrum for a time-selective noise. Therefore, we used statistical models of clean speech for the estimation of the frame reliability. The proposed MFR (model-based frame reliability) approximates frame SNR values using filterbank energy vectors that are obtained by the inverse transformation of input MFCC (mal-frequency cepstral coefficient) vectors and mean vectors of a reference model. Experiments on various burnt noises revealed that the proposed method could represent the frame reliability effectively. We could improve the recognition performance by using MFR values as weighting factors at the likelihood calculation step.

Phoneme Segmentation in Consideration of Speech feature in Korean Speech Recognition (한국어 음성인식에서 음성의 특성을 고려한 음소 경계 검출)

  • 서영완;송점동;이정현
    • Journal of Internet Computing and Services
    • /
    • v.2 no.1
    • /
    • pp.31-38
    • /
    • 2001
  • Speech database built of phonemes is significant in the studies of speech recognition, speech synthesis and analysis, Phoneme, consist of voiced sounds and unvoiced ones, Though there are many feature differences in voiced and unvoiced sounds, the traditional algorithms for detecting the boundary between phonemes do not reflect on them and determine the boundary between phonemes by comparing parameters of current frame with those of previous frame in time domain, In this paper, we propose the assort algorithm, which is based on a block and reflecting upon the feature differences between voiced and unvoiced sounds for phoneme segmentation, The assort algorithm uses the distance measure based upon MFCC(Mel-Frequency Cepstrum Coefficient) as a comparing spectrum measure, and uses the energy, zero crossing rate, spectral energy ratio, the formant frequency to separate voiced sounds from unvoiced sounds, N, the result of out experiment, the proposed system showed about 79 percents precision subject to the 3 or 4 syllables isolated words, and improved about 8 percents in the precision over the existing phonemes segmentation system.

  • PDF

Tempo-oriented music recommendation system based on human activity recognition using accelerometer and gyroscope data (가속도계와 자이로스코프 데이터를 사용한 인간 행동 인식 기반의 템포 지향 음악 추천 시스템)

  • Shin, Seung-Su;Lee, Gi Yong;Kim, Hyoung-Gook
    • The Journal of the Acoustical Society of Korea
    • /
    • v.39 no.4
    • /
    • pp.286-291
    • /
    • 2020
  • In this paper, we propose a system that recommends music through tempo-oriented music classification and sensor-based human activity recognition. The proposed method indexes music files using tempo-oriented music classification and recommends suitable music according to the recognized user's activity. For accurate music classification, a dynamic classification based on a modulation spectrum and a sequence classification based on a Mel-spectrogram are used in combination. In addition, simple accelerometer and gyroscope sensor data of the smartphone are applied to deep spiking neural networks to improve activity recognition performance. Finally, music recommendation is performed through a mapping table considering the relationship between the recognized activity and the indexed music file. The experimental results show that the proposed system is suitable for use in any practical mobile device with a music player.