• Title/Summary/Keyword: ZCR(Zero Crossing Rate)

Search Result 33, Processing Time 0.027 seconds

An Automatic Segmentation System Based on HMM and Correction Algorithm (HMM 및 보정 알고리즘을 이용한 자동 음성 분할 시스템)

  • Kim, Mu-Jung;Kwon, Chul-Hong
    • Speech Sciences
    • /
    • v.9 no.4
    • /
    • pp.265-274
    • /
    • 2002
  • In this paper we propose an automatic segmentation system that outputs the time alignment information of phoneme boundary using Viterbi search with HMM (Hidden Markov Model) and corrects these results by an UVS (unvoiced/voiced/silence) classification algorithm. We selecte a set of 39 monophones and a set of 647 extended phones for HMM models. For the UVS classification we use the feature parameters such as ZCR (Zero Crossing Rate), log energy, spectral distribution. The result of forced alignment using the extended phone set is 11% better than that of the monophone set. The UVS classification algorithm shows high performance to correct the segmentation results.

  • PDF

다양한 특징 파라미터와 선형변별분석을 이용한 후두암의 선별검사

  • 이원범;왕수건;권순복;전경명;전계록;김수미;김형순;양병곤;조철우
    • Proceedings of the KSLP Conference
    • /
    • 2003.11a
    • /
    • pp.149-149
    • /
    • 2003
  • 후두질환 감별용 음성 분석방법인 multi-dimensional voice program (MDVP)으로 분석이 불가능할 정도로 주기성이 크게 훼손된 후두암 말기의 음성 에 대하여 효과적인 감별을 하기 위하여, 몇 가지 켑스트럼(cepstrum) 파라미터를 비롯하여, 주기성 및 그 동요 정도, 영교차율(zero-crossing rate, ZCR), 스텍트럼 중심 (spectral centroid, SC) 등 다양한 특징 파라미터를 이용한 감별 실험을 수행하였다. 후두암 감별 실험을 위해 부산대학교 병원 이비인후과에서 수집한 정상 남자 음성 데이터 50개, 양성 후두질환 남자 음성 데이터 50개 및 남성 후두암 환자 음성 데이터 105개를 사용하였다. 음성 데이터는 단모음 /아/ 발성만을 사용하였고, 정상인과 양성후두질환 환자, 그리고 MDVP 분석이 가능한 후두암 환자 음성 데이터 중 2/3는 학습에, 나머지 113은 감별실험에 사용하였다. 후두암 감별을 위한 분류기로는 Gaussian Mixture Model(GMM) 분류기를 사용하였으며, 이때 모델의 복잡도를 표현하는 mixture 수는 1에서 10까지 가변시키면서 가장 좋은 성능을 나타내는 값으로 결정하였다. 또한 모든 실험에서 켑스트럼 분석의 차수는 동일하게 12차로 고정시켰다. (중략)

  • PDF

Implementation of Quad Variable Rates ADPCM Speech CODEC on C6000 DSP considering the Environmental Noise (배경잡음을 고려한 4배 가변 압축률을 갖는 ADPCM의 C6000 DSP 실시간 구현)

  • Kim Dae-Sung;Han Kyong-ho
    • Proceedings of the KIPE Conference
    • /
    • 2002.07a
    • /
    • pp.727-729
    • /
    • 2002
  • In this paper, we proposed quad variable rates ADPCM coding method and its implementation on C6000 DSP, which is modified from the standard ADPCM of ITU G.726 for speech quality improvement considering the environmental noise Four coding rates, 16Kbps, 24Kbps, 32Kbps and 40Kbps are used for speech window samples and the rate decision threshold is decided by the environmental noise level. The object of the proposed method is to reduce the coding rate while retaining the speech quality and the speech quality is considerably close to 40Kbps single rate coder with the coding rate close to 16Kbps single rate coder under the environmental noise. The environmental noise level affects the coding rate and the noise level is calculated per every speech window samples. At high noise level, more samples are coded at higher rates to enhance the quality, but at low noise level, only the big speech signals are coded at higher rates and more speech samples are coded at lower coding rates to reduce the coding rates. The influence of the noise on tile speech signal is considerably high for small signals and the small signal has the higher ZCR (zero crossing rate). The method is simulated in PC and to be implemented on C6000 floating point DSP board in real time operations.

  • PDF

Intelligent Adaptive Active Noise Control in Non-stationary Noise Environments (비정상 잡음환경에서의 지능형 적응 능동소음제어)

  • Mu, Xiangbin;Ko, JinSeok;Rheem, JaeYeol
    • The Journal of the Acoustical Society of Korea
    • /
    • v.32 no.5
    • /
    • pp.408-414
    • /
    • 2013
  • The famous filtered-x least mean square (FxLMS) algorithm for active noise control (ANC) systems may become unstable in non-stationary noise environment. To solve this problem, Sun's algorithm and Akhtar's algorithm are developed based on modifying the reference signal in update of FxLMS algorithm, but these two algorithms have dissatisfactory stability in dealing with sustaining impulsive noise. In proposed algorithm, probability estimation and zero-crossing rate (ZCR) control are used to improve the stability and performance, at the same time, an optimal parameter selection based on fuzzy system is utilized. Computer simulation results prove the proposed algorithm has faster convergence and better stability in non-stationary noise environment.

The EMG Measurement of Simple and Iterative Worker′s Muscle Fatigue (단순반복 근로자의 근육피로도에 관한 EMG분석)

  • 서승록;임완희
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.6 no.3
    • /
    • pp.79-86
    • /
    • 2001
  • The CTD(Cumulative Trauma Disorder) as a new kind of occupational disease occurs mainly to workers on handling line under the highly-specialized industrial environments. This study took into account their exposure to Cumulative Trauma Disorders(CTD) by the utilization of EMG system, with respect to worker's muscle fatigue test according to fulfillment of iterative and simple task. The findings of this study were as follows : From the result of AEMG test analysis, worker's fatigue extent according to elapsed time of task was inclined to be increased continually. On the other hand, after its task ending, their fatigue extent was inclined to be decreased than before-circumstance of refractory brick lifting. The transference of MF(Median Frequency) and MPF(Mean Power Frequency) had highly significant difference between muscle fatigue and the elapsed time of work. Especially, their fatigue extent to erectorspinae and multifidus to lift firebrick was increased in the mean time. The transference of ZCR(Zero Crossing Rate) had considerable significant difference between muscle fatigue and the elapsed time of work. In short, as the work went of the muscle fatigue extent increased gradually. Thus, it can be concluded that the fatigue of erectorspinae and multifidus extent according to fulfillment of iterative and simple task is gradually being increased.

  • PDF

Estimation of Concrete Strength Based on Artificial Intelligence Techniques (인공지능 기법에 의한 콘크리트 강도 추정)

  • 김세동;신동환;이영석;노승용;김성환
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.7
    • /
    • pp.101-111
    • /
    • 1999
  • This paper presents concrete pattern recognition method to identify the strength of concrete by evidence accumulation with multiple parameters based on artificial intelligence techniques. At first, variance(VAR), zero-crossing(ZCR), mean frequency(MEANF), and autoregressive model coefficient(ARC) and linear cepstrum coefficient(LCC) are extracted as feature parameters from ultrasonic signal of concrete. Pattern recognition is carried out through the evidence accumulation procedure using distance measured with reference parameters. A fuzzy mapping function is designed to transform the distances for the application of the evidence accumulation method. Results(92% successful pattern recognition rate) are presented to support the feasibility of the suggested approach for concrete pattern recognition.

  • PDF

The Comparison of Sensitivity of Numerical Parameters for Quantification of Electromyographic (EMG) Signal (근전도의 정량적 분석시 사용되는 수리적 파라미터의 민감도 비교)

  • Kim, Jung-Yong;Jung, Myung-Chul
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.25 no.3
    • /
    • pp.330-335
    • /
    • 1999
  • The goal of the study is to determine the most sensitive parameter to represent the degree of muscle force and fatigue. Various numerical parameters such as the first coefficient of Autoregressive (AR) Model, Root Mean Square (RMS), Zero Crossing Rate (ZCR), Mean Power Frequency (MPF), Median Frequency (MF) were tested in this study. Ten healthy male subjects participated in the experiment. They were asked to extend their trunk by using the right and left erector spinae muscles during a sustained isometric contraction for twenty seconds. The force levels were 15%, 30%, 45%, 60%, and 75% of Maximal Voluntary Contraction (MVC), and the order of trials was randomized. The results showed that RMS was the best parameter to measure the force level of the muscle, and that the first coefficient of AR model was relatively sensitive parameter for the fatigue measurement at less than 60% MVC condition. At the 75% MVC, however, both MPF and the first coefficient of AR Model showed the best performance in quantification of muscle fatigue. Therefore, the sensitivity of measurement can be improved by properly selecting the parameter based upon the level of force during a sustained isometric condition.

  • PDF

A Digital Audio Watermark Using Wavelet Transform and Masking Effect (웨이브릿과 마스킹 효과를 이용한 디지털 오디오 워터마킹)

  • Hwang, Won-Young;Kang, Hwan-Il;Han, Seung-Soo;Kim, Kab-Il;Kang, Hwan-Soo
    • Proceedings of the IEEK Conference
    • /
    • 2003.11b
    • /
    • pp.243-246
    • /
    • 2003
  • In this paper, we propose a new digital audio watermarking technique with the wavelet transform. The watermark is embedded by eliminating unnecessary information of audio signal based on human auditory system (HAS). This algorithm is an audio watermarking method, which does not require any original audio information in watermark extraction process. In this paper, the masking effect is used for audio watermarking, that is, post-tempera] masking effect. We construct the window with the synchronization signal and we extract the best frame in the window by using the zero-crossing rate (ZCR) and the energy of the audio signal. The watermark may be extracted by using the correlation of the watermark signal and the portion of the frame. Experimental results show good robustness against MPEG1-layer3 compression and other common signal processing manipulations. All the attacks are made after the D/A/D conversion.

  • PDF

Emotion Recognition in Arabic Speech from Saudi Dialect Corpus Using Machine Learning and Deep Learning Algorithms

  • Hanaa Alamri;Hanan S. Alshanbari
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.8
    • /
    • pp.9-16
    • /
    • 2023
  • Speech can actively elicit feelings and attitudes by using words. It is important for researchers to identify the emotional content contained in speech signals as well as the sort of emotion that resulted from the speech that was made. In this study, we studied the emotion recognition system using a database in Arabic, especially in the Saudi dialect, the database is from a YouTube channel called Telfaz11, The four emotions that were examined were anger, happiness, sadness, and neutral. In our experiments, we extracted features from audio signals, such as Mel Frequency Cepstral Coefficient (MFCC) and Zero-Crossing Rate (ZCR), then we classified emotions using many classification algorithms such as machine learning algorithms (Support Vector Machine (SVM) and K-Nearest Neighbor (KNN)) and deep learning algorithms such as (Convolution Neural Network (CNN) and Long Short-Term Memory (LSTM)). Our Experiments showed that the MFCC feature extraction method and CNN model obtained the best accuracy result with 95%, proving the effectiveness of this classification system in recognizing Arabic spoken emotions.

A Study on the Improvement of DTW with Speech Silence Detection (음성의 묵음구간 검출을 통한 DTW의 성능개선에 관한 연구)

  • Kim, Jong-Kuk;Jo, Wang-Rae;Bae, Myung-Jin
    • Speech Sciences
    • /
    • v.10 no.4
    • /
    • pp.117-124
    • /
    • 2003
  • Speaker recognition is the technology that confirms the identification of speaker by using the characteristic of speech. Such technique is classified into speaker identification and speaker verification: The first method discriminates the speaker from the preregistered group and recognize the word, the second verifies the speaker who claims the identification. This method that extracts the information of speaker from the speech and confirms the individual identification becomes one of the most efficient technology as the service via telephone network is popularized. Some problems, however, must be solved for the real application as follows; The first thing is concerning that the safe method is necessary to reject the imposter because the recognition is not performed for the only preregistered customer. The second thing is about the fact that the characteristic of speech is changed as time goes by, So this fact causes the severe degradation of recognition rate and the inconvenience of users as the number of times to utter the text increases. The last thing is relating to the fact that the common characteristic among speakers causes the wrong recognition result. The silence parts being included the center of speech cause that identification rate is decreased. In this paper, to make improvement, We proposed identification rate can be improved by removing silence part before processing identification algorithm. The methods detecting speech area are zero crossing rate, energy of signal detect end point and starting point of the speech and process DTW algorithm by using two methods in this paper. As a result, the proposed method is obtained about 3% of improved recognition rate compare with the conventional methods.

  • PDF