• 제목/요약/키워드: Speech Signals

Search Result 499, Processing Time 0.03 seconds

A Study on Audio/Voice Color Processing Technique (오디오/음성 컬러 처리 기술 연구)

  • Kim Kwangki;Kim Sang-Jin;Son BeakKwon;Hahn Minsoo
    • Proceedings of the KSPS conference
    • /
    • 2003.05a
    • /
    • pp.153-156
    • /
    • 2003
  • In this paper, we studied advanced audio/ voice information processing techniques, and trying to introduce more human friendly audio/voice. It is just in the beginning stage. Firstly, we approached in well-known time-domain methods such as moving average, differentiation, interpolation, and decimation. Moreover, some variation of them and envelope contour modification are utilized. We also suggested the MOS test to evaluate subjective listening factors. In the long term viewpoint, user's preference, mood, and environmental conditions will be considered and according to them, we hope our future technique can adapt speech and audio signals automatically.

  • PDF

Sinusoidal Modeling of Polyphonic Audio Signals Using Dynamic Segmentation Method (동적 세그멘테이션을 이용한 폴리포닉 오디오 신호의 정현파 모델링)

  • 장호근;박주성
    • The Journal of the Acoustical Society of Korea
    • /
    • v.19 no.4
    • /
    • pp.58-68
    • /
    • 2000
  • This paper proposes a sinusoidal modeling of polyphonic audio signals. Sinusoidal modeling which has been applied well to speech and monophonic signals cannot be applied directly to polyphonic signals because a window size for sinusoidal analysis cannot be determined over the entire signal. In addition, for high quality synthesized signal transient parts like attacks should be preserved which determines timbre of musical instrument. In this paper, a multiresolution filter bank is designed which splits the input signal into six octave-spaced subbands without aliasing and sinusoidal modeling is applied to each subband signal. To alleviate smearing of transients in sinusoidal modeling a dynamic segmentation method is applied to subbands which determines the analysis-synthesis frame size adaptively to fit time-frequency characteristics of the subband signal. The improved dynamic segmentation is proposed which shows better performance about transients and reduced computation. For various polyphonic audio signals the result of simulation shows the suggested sinusoidal modeling can model polyphonic audio signals without loss of perceptual quality.

  • PDF

Audio Resource Adaptation (오디오 신호의 적응 방법)

  • 오은미
    • Proceedings of the IEEK Conference
    • /
    • 2003.07d
    • /
    • pp.1419-1422
    • /
    • 2003
  • Multimedia contents what we call Digital Items include various types of resources such as music, speech, text, video, graphics, and so on. The current Adaptation QoS described in the ISO/IEC 21000-7 CD-Part 7: Digital Item Adaptation, however, lacks adaptation methods for audio signals. The goal of this paper is to provide adaptation methods that are necessary to deal with audio signals. Two operations are introduced in order to adapt audio items. One method is to make use of the functionality of Fine Grain Scalability, and the other is intended to drop the channel of audio output channel. This paper provides a DIA description tool that associates the operators with the corresponding values of the constraint and the utility. Furthermore, the operations are evaluated and compared to alternative solutions.

  • PDF

CONCERT HALL ACOUSTICS - Physics, Physiology and Psychology fusing Music and Hall - (콘서트홀 음향 - 음악과 홀을 융합시키는 물리학, 생리학, 심리학 -)

  • 안도요이찌
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1992.06a
    • /
    • pp.3-8
    • /
    • 1992
  • The theory of subjective preference with temporal and spatial factors which include sound signals arriving at both ears is described. Then, auditory evoked potentials which may relate to a primitive subjective response namely subjective preference are discussed. According to such fundamental phenomena, a workable model of human auditory-brain system is proposed. For eample, important subjective attributes, such as loudness, coloration, threshold of preception of a reflection and echo distrubance as well as subjective preference in relation to the initial time delay gap between the direct sound and the first reflection, and the subsequent reverberation time are well described by the autocorrelation function of source signals. Speech clarity, subjective diffuseness as well as subjective preference are related to the magnitude of inter-aural crosscorrelation function (IACC). Even the caktail party effects may be eplained by spatialization of human brain, i.e., independence of temporal and spatial factors.

  • PDF

Frequency Domain Blind Source Seperation Using Cross-Correlation of Input Signals (입력신호 상호상관을 이용한 주파수 영역 블라인드 음원 분리)

  • Sung Chang Sook;Park Jang Sik;Son Kyung Sik;Park Keun-Soo
    • Journal of Korea Multimedia Society
    • /
    • v.8 no.3
    • /
    • pp.328-335
    • /
    • 2005
  • This paper proposes a frequency domain independent component analysis (ICA) algorithm to separate the mixed speech signals using a multiple microphone array By estimating the delay timings using a input cross-correlation, even in the delayed mixture case, we propose a good initial value setting method which leads to optimal convergence. To reduce the calculation, separation process is performed at frequency domain. The results of simulations confirms the better performances of the proposed algorithm.

  • PDF

Discrimination of Emotional States In Voice and Facial Expression

  • Kim, Sung-Ill;Yasunari Yoshitomi;Chung, Hyun-Yeol
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.2E
    • /
    • pp.98-104
    • /
    • 2002
  • The present study describes a combination method to recognize the human affective states such as anger, happiness, sadness, or surprise. For this, we extracted emotional features from voice signals and facial expressions, and then trained them to recognize emotional states using hidden Markov model (HMM) and neural network (NN). For voices, we used prosodic parameters such as pitch signals, energy, and their derivatives, which were then trained by HMM for recognition. For facial expressions, on the other hands, we used feature parameters extracted from thermal and visible images, and these feature parameters were then trained by NN for recognition. The recognition rates for the combined parameters obtained from voice and facial expressions showed better performance than any of two isolated sets of parameters. The simulation results were also compared with human questionnaire results.

Development of a Cryptographic Dongle for Secure Voice Encryption over GSM Voice Channel

  • Kim, Tae-Yong;Jang, Won-Tae;Lee, Hoon-Jae
    • Journal of information and communication convergence engineering
    • /
    • v.7 no.4
    • /
    • pp.561-564
    • /
    • 2009
  • A cryptographic dongle, which is capable of transmitting encrypted voice signals over the CDMA/GSM voice channel, was designed and implemented. The dongle used PIC microcontroller for signals processing including analog to digital conversion and digital to analog conversion, encryption and communicating with the smart phone. A smart phone was used to provide power to the dongle as well as passing the encrypted speech to the smart phone which then transmits the signal to the network. A number of tests were conducted to check the efficiency of the dongle, the firmware programming, the encryption algorithms, and the secret key management system, the interface between the smart phone and the dongle and the noise level.

Sensibility Classification Algorithm of EEGs using Multi-template Method (다중 템플릿 방법을 이용한 뇌파의 감성 분류 알고리즘)

  • Kim Dong-Jun
    • The Transactions of the Korean Institute of Electrical Engineers D
    • /
    • v.53 no.12
    • /
    • pp.834-838
    • /
    • 2004
  • This paper proposes an algorithm for EEG pattern classification using the Multi-template method, which is a kind of speaker adaptation method for speech signal processing. 10-channel EEG signals are collected in various environments. The linear prediction coefficients of the EEGs are extracted as the feature parameter of human sensibility. The human sensibility classification algorithm is developed using neural networks. Using EEGs of comfortable or uncomfortable seats, the proposed algorithm showed about 75% of classification performance in subject-independent test. In the tests using EEG signals according to room temperature and humidity variations, the proposed algorithm showed good performance in tracking of pleasantness changes and the subject-independent tests produced similar performances with subject-dependent ones.

Design of Intelligent Emotion Recognition Model (지능형 감정인식 모델설계)

  • 김이곤;김서영;하종필
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2001.12a
    • /
    • pp.46-50
    • /
    • 2001
  • Voice is one of the most efficient communication media and it includes several kinds of factors about speaker, context emotion and so on. Human emotion is expressed in the speech, the gesture, the physiological phenomena (the breath, the beating of the pulse, etc). In this paper, the method to have cognizance of emotion from anyone's voice signals is presented and simulated by using neuro-fuzzy model.

  • PDF

Classification of Pathological Speech Signals Using Wavelet Transform and Neural Network (Wavelet 변환과 신경회로망을 이용한 후두의 양성종양의 식별에 관한 연구)

  • 김대현
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1998.06e
    • /
    • pp.395-398
    • /
    • 1998
  • 본 논문에서는 웨이브렛 변환에서 구해진 파라미터와 신경회로망을 이용하여 후두의 양성종양과 정상상태를 구분하는 실험을 행하였다. 식별 파라미터로는 웨이브렛변환으로부터 도출된 ECS 파라미터와 jitter, shimmer를 이용하였으며 신경회로망은 한 개의 은닉층을 갖는 다층구조 신경망을 이용하였다. 신경망의 입력으로는 세가지 파라미터의 조합을 두 개 또는 세 개를 입력하여 각각의 경우의 식별율을 조사하였다. 실험결과 75%에서 93%에 이르는 식별율을 얻었다.

  • PDF