• Title/Summary/Keyword: Perceptual signal analysis

Search Result 21, Processing Time 0.03 seconds

Online blind source separation and dereverberation of speech based on a joint diagonalizability constraint (공동 행렬대각화 조건 기반 온라인 음원 신호 분리 및 잔향제거)

  • Yu, Ho-Gun;Kim, Do-Hui;Song, Min-Hwan;Park, Hyung-Min
    • The Journal of the Acoustical Society of Korea
    • /
    • v.40 no.5
    • /
    • pp.503-514
    • /
    • 2021
  • Reverberation in speech signals tends to significantly degrade the performance of the Blind Source Separation (BSS) system. Especially in online systems, the performance degradation becomes severe. Methods based on joint diagonalizability constraints have been recently developed to tackle the problem. To improve the quality of separated speech, in this paper, we add the proposed de-reverberation method to the online BSS algorithm based on the constraints in reverberant environments. Through experiments on the WSJCAM0 corpus, the proposed method was compared with the existing online BSS algorithm. The performance evaluation by the Signal-to-Distortion Ratio and the Perceptual Evaluation of Speech Quality demonstrated that SDR improved from 1.23 dB to 3.76 dB and PESQ improved from 1.15 to 2.12 on average.

Food quality management using sensory discrimination method based on signal detection theory and its application to drinking water (식품 품질관리를 위한 신호탐지이론(SDT) 감각차이식별분석 이론과 생수 품질관리에의 활용)

  • Kim, Min-A;Sim, Hye-Min;Lee, Hye-Seong
    • Food Science and Industry
    • /
    • v.52 no.1
    • /
    • pp.20-31
    • /
    • 2019
  • Sensory perception of food/beverage products is one of the most important quality factors to determine consumer acceptability and thus sensory discrimination methodology has been a vital tool for quality management. Signal detection theory(SDT) and Thurstonian modeling provide the most advanced psychometric approach to modeling various discrimination methods. In these theories, perceptual and cognitive decisional factors are considered so that, a fundamental measure of sensory difference (d') can be computed, independent of test methods used. In this paper, sensory discrimination analysis based on SDT and Thurstonian modeling is introduced for more accurate and systematic applications of sensory and hedonic quality management in industry. Ways to realize the statistical power and relative sensitivity of sensory discrimination methods theorized in SDT and Thurstonian modeling in practice, are also discussed by using a case study of the Nongshim quality management program for drinking water in which SDT A-Not A test methodology was further optimized.

Sinusoidal Modeling of Polyphonic Audio Signals Using Dynamic Segmentation Method (동적 세그멘테이션을 이용한 폴리포닉 오디오 신호의 정현파 모델링)

  • 장호근;박주성
    • The Journal of the Acoustical Society of Korea
    • /
    • v.19 no.4
    • /
    • pp.58-68
    • /
    • 2000
  • This paper proposes a sinusoidal modeling of polyphonic audio signals. Sinusoidal modeling which has been applied well to speech and monophonic signals cannot be applied directly to polyphonic signals because a window size for sinusoidal analysis cannot be determined over the entire signal. In addition, for high quality synthesized signal transient parts like attacks should be preserved which determines timbre of musical instrument. In this paper, a multiresolution filter bank is designed which splits the input signal into six octave-spaced subbands without aliasing and sinusoidal modeling is applied to each subband signal. To alleviate smearing of transients in sinusoidal modeling a dynamic segmentation method is applied to subbands which determines the analysis-synthesis frame size adaptively to fit time-frequency characteristics of the subband signal. The improved dynamic segmentation is proposed which shows better performance about transients and reduced computation. For various polyphonic audio signals the result of simulation shows the suggested sinusoidal modeling can model polyphonic audio signals without loss of perceptual quality.

  • PDF

An Enhancement of the MPEG-2 Audio Encoder Using General DSPs (범용 DSP를 이용한 MPEG-2 오디오 부호화기의 성능 개선)

  • 오현오;김성윤;윤대희;차일환;이준용
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 1997.11a
    • /
    • pp.63-67
    • /
    • 1997
  • The ISO(International Standard Organization) has standardized MPEG-2 audio. The MPEG-2 audio compression algorithm is based upon subband analysis and exploits the human auditory characteristics to achieve a low bit rate with minimum perceptual loss of audio signal quality. This thesis presents an enhanced MPEG-2 audio encoder using multiple TMS320C30 general purpose DSP's. The developed system is made up of five slave boards and one master board. Each slave board performs susband analysis psychoacoustic parameter calculation for one channel, and the master board manages bit allocation, quantization, and bit-stream formatting for all channels. Parallel processing and pipelining techniques are used in hardware structure and fast algorithms are applied in each subroutine to implement a real-time process. The implemented system supports multichannel up to 5.1 and various bitrates.

  • PDF

An Emotion Recognition Technique using Speech Signals (음성신호를 이용한 감정인식)

  • Jung, Byung-Wook;Cheun, Seung-Pyo;Kim, Youn-Tae;Kim, Sung-Shin
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.18 no.4
    • /
    • pp.494-500
    • /
    • 2008
  • In the field of development of human interface technology, the interactions between human and machine are important. The research on emotion recognition helps these interactions. This paper presents an algorithm for emotion recognition based on personalized speech signals. The proposed approach is trying to extract the characteristic of speech signal for emotion recognition using PLP (perceptual linear prediction) analysis. The PLP analysis technique was originally designed to suppress speaker dependent components in features used for automatic speech recognition, but later experiments demonstrated the efficiency of their use for speaker recognition tasks. So this paper proposed an algorithm that can easily evaluate the personal emotion from speech signals in real time using personalized emotion patterns that are made by PLP analysis. The experimental results show that the maximum recognition rate for the speaker dependant system is above 90%, whereas the average recognition rate is 75%. The proposed system has a simple structure and but efficient to be used in real time.

A Study on the Speech Signal Processing for Cochlear Implant using the PLP Analysis (청각보철을 위한 PLP방식의 음성신호처리에 관한 연구)

  • Kim, Young-Sun;Choi, Doo-Il;Park, Sang-Hui;Beack, Seung-Hwa
    • Proceedings of the KOSOMBE Conference
    • /
    • v.1992 no.05
    • /
    • pp.167-170
    • /
    • 1992
  • 본 논문에서는 감각성 난청자들이 정상인들과 유사한 음성 인식을 하도록 청각 보철 기기를 구성하였다. 음성의 포먼트를 추출하기 위해서는 PLP(Perceptual Linear Prediction) 방식을 이용하였으며, pitch 추출을 위해서는 3 단계 클리핑 함수를 이용한 자기 상관법을 이용하였다. 또한 다중 채널 - 다중 전극 방식을 이용하여 내이의 헤어셀에 17 개의 전극을 삽입하여 신호를 가하는 시뮬레이션을 하였다. 실험에 사용한 데이타는 모음 /a/, /e/, /i/, /o/, /u/로 전모음과 후모음의 차이를 구별하였으며 두번째 포먼트의 변화와 포먼트 통합 이론에 대한 검증을 하였다.

  • PDF

The Acoustic Analysis of Korean Read Speech - with respect to the prosodic phrasing - (한국어 낭독체 문장의 음향분석 -바람과 햇님의 운율구 생성을 중심으로-)

  • Sung Chuljae
    • Proceedings of the KSPS conference
    • /
    • 1996.02a
    • /
    • pp.157-172
    • /
    • 1996
  • This study aims to suggest some theoretical methodology for analysis of the prosodic patterns in Korean Read Speech. The engineering effort relevant to the phonetic study has focused to the importance of prosodic phrasing which may play a major role in analyzing the phonetic DB. Before establishing the prosodic phrase as the prosodic unit, we should describe the features of the boundary signal in a target sentence. With this in mind, the general characteristics of Read Speech and the ToBI(tones and Break Indices), which has been currently in vogue with respect to the prosodic labelling system were presented as the first step. The concrete analysis was carried out with the fable 'North Wind and the Sun' Korean version, where about 25 prosodic units were discriminated by perceptual approach for 5 subjects. Establishing various informations which can be used for deciding a boundary position systematically, we can proceed to the next, viz. acoustic analysis of prosodic unit. The most important which we primarily study for improving the naturalness of synthetic speech may be, at first, detecting the boundary signals in the speech file and accordingly reestablishment it within the raw text.

  • PDF

Video Coding Method Using Visual Perception Model based on Motion Analysis (움직임 분석 기반의 시각인지 모델을 이용한 비디오 코딩 방법)

  • Oh, Hyung-Suk;Kim, Won-Ha
    • Journal of Broadcast Engineering
    • /
    • v.17 no.2
    • /
    • pp.223-236
    • /
    • 2012
  • We develop a video processing method that allows the more advanced human perception oriented video coding. The proposed method necessarily reflects all influences by the rate-distortion based optimization and the human visual perception that is affected by the visual saliency, the limited space-time resolution and the regional moving history. For reflecting the human perceptual effects, we devise an online moving pattern classifier using the Hedge algorithm. Then, we embed the existing visual saliency into the proposed moving patterns so as to establish a human visual perception model. In order to realize the proposed human visual perception model, we extend the conventional foveation filtering method. Compared to the conventional foveation filter only smoothing less stimulus video signals, the developed foveation filter can locally smooth and enhance signals according to the human visual perception without causing any artifacts. Due to signal enhancement, the developed foveation filter more efficiently transfers the bandwidth saved at smoothed signals to the enhanced signals. Performance evaluation verifies that the proposed video processing method satisfies the overall video quality, while improving the perceptual quality by 12%~44%.

A Study on the Automatic Speech Control System Using DMS model on Real-Time Windows Environment (실시간 윈도우 환경에서 DMS모델을 이용한 자동 음성 제어 시스템에 관한 연구)

  • 이정기;남동선;양진우;김순협
    • The Journal of the Acoustical Society of Korea
    • /
    • v.19 no.3
    • /
    • pp.51-56
    • /
    • 2000
  • Is this paper, we studied on the automatic speech control system in real-time windows environment using voice recognition. The applied reference pattern is the variable DMS model which is proposed to fasten execution speed and the one-stage DP algorithm using this model is used for recognition algorithm. The recognition vocabulary set is composed of control command words which are frequently used in windows environment. In this paper, an automatic speech period detection algorithm which is for on-line voice processing in windows environment is implemented. The variable DMS model which applies variable number of section in consideration of duration of the input signal is proposed. Sometimes, unnecessary recognition target word are generated. therefore model is reconstructed in on-line to handle this efficiently. The Perceptual Linear Predictive analysis method which generate feature vector from extracted feature of voice is applied. According to the experiment result, but recognition speech is fastened in the proposed model because of small loud of calculation. The multi-speaker-independent recognition rate and the multi-speaker-dependent recognition rate is 99.08% and 99.39% respectively. In the noisy environment the recognition rate is 96.25%.

  • PDF

Image Adaptive Block DCT-Based Perceptual Digital Watermarking (영상 특성에 적응적인 블록 DCT 기반 지각적 디지털 워터마킹)

  • 최윤희;최태선
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.41 no.6
    • /
    • pp.221-229
    • /
    • 2004
  • We present new digital watermarking scheme that embeds a watermark according to the characteristics of the image or video. The scheme is compatible with established image compression standard. We define a weighting function using a parent-child structure of the DCT coefficients in a block to embed a maximum watermark. The spatio-frequency localization of the DCT coefficients can be achieved with this structure. In the detection stage, we present an optimum a posteriori threshold with a given false detection error probability based on the statistical analysis. Simulation results show that the proposed algorithm is efficient and robust against various signal processing techniques. Especially, they are robust against widely used coding standards, such as JPEG and MPEG.