• Title/Summary/Keyword: Simulation speech

Search Result 299, Processing Time 0.21 seconds

Analysis and Synthesis of Audio Signals using a Sinusoidal Model with Psychoacoustic Criteria (정현파 모델을 이용한 오디오 신호의 심리음향적 분석 및 합성)

  • 남승현;강경옥;홍진우
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.2
    • /
    • pp.77-82
    • /
    • 1999
  • A sinusoidal model has been widely used in the analysis and synthesis of speech and audio signals, and becomes one of the efficient candidates for high quality low bit rate audio coders. One of the crucial steps in the analysis and synthesis using a sinusoidal model is the detection of tonal components. This paper proposes an efficient method for the analysis and synthesis of audio signals using a sinusoidal model, which uses psychoacoustic criteria such as masking effect, masking index, and JNDf(Just Noticeable Difference in Frequency). Simulation results show that the proposed method reduces the number of sinusoids significantly without degrading the quality of the synthesized audio signals.

  • PDF

A Study on Speaker Recognition Algorithm Through Wire/Wireless Telephone (유무선 전화를 통한 화자인식 알고리즘에 관한 연구)

  • 김정호;정희석;강철호;김선희
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.3
    • /
    • pp.182-187
    • /
    • 2003
  • In this thesis, we propose the algorithm to improve the performance of speaker verification that is mapping feature parameters by using RBF neural network. There is a big difference between wire vector region and wireless one which comes from the same speaker. For wire/wireless speakers model production, speaker verification system should distinguish the wire/wireless channel that based on speech recognition system. And the feature vector of untrained channel models is mapped to the feature vector(LPC Cepstrum) of trained channel model by using RBF neural network. As a simulation result, the proposed algorithm makes 0.6%∼10.5% performance improvement compared to conventional method such as cepstral mean subtraction.

An approximated implementation of affine projection algorithm using Gram-Scheme orthogonalization (Gram-Schmidt 직교화를 이용한 affine projection 알고리즘의 근사적 구현)

  • 김은숙;정양원;박선준;박영철;윤대희
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.24 no.9B
    • /
    • pp.1785-1794
    • /
    • 1999
  • The affine projection algorithm has known t require less computational complexity than RLS but have much faster convergence than NLMS for speech-like input signals. But the affine projection algorithm is still much more computationally demanding than the LMS algorithm because it requires the matrix inversion. In this paper, we show that the affine projection algorithm can be realized with the Gram-Schmidt orthogonalizaion of input vectors. Using the derived relation, we propose an approximate but much more efficient implementation of the affine projection algorithm. Simulation results show that the proposed algorithm has the convergence speed that is comparable to the affine projection algorithm with only a slight extra calculation complexity beyond that of NLMS.

  • PDF

Analyzing the Acoustic Elements and Emotion Recognition from Speech Signal Based on DRNN (음향적 요소분석과 DRNN을 이용한 음성신호의 감성 인식)

  • Sim, Kwee-Bo;Park, Chang-Hyun;Joo, Young-Hoon
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.13 no.1
    • /
    • pp.45-50
    • /
    • 2003
  • Recently, robots technique has been developed remarkably. Emotion recognition is necessary to make an intimate robot. This paper shows the simulator and simulation result which recognize or classify emotions by learning pitch pattern. Also, because the pitch is not sufficient for recognizing emotion, we added acoustic elements. For that reason, we analyze the relation between emotion and acoustic elements. The simulator is composed of the DRNN(Dynamic Recurrent Neural Network), Feature extraction. DRNN is a learning algorithm for pitch pattern.

Optimal Design of a MEMS-type Piezoelectric Microphone (MEMS 구조 압전 마이크로폰의 최적구조 설계)

  • Kwon, Min-Hyeong;Ra, Yong-Ho;Jeon, Dae-Woo;Lee, Young-Jin
    • Journal of Sensor Science and Technology
    • /
    • v.27 no.4
    • /
    • pp.269-274
    • /
    • 2018
  • High-sensitivity signal-to-noise ratio (SNR) microphones are essentially required for a broad range of automatic speech recognition applications. Piezoelectric microphones have several advantages compared to conventional capacitor microphones including high stiffness and high SNR. In this study, we designed a new piezoelectric membrane structure by using the finite elements method (FEM) and an optimization technique to improve the sensitivity of the transducer, which has a high-quality AlN piezoelectric thin film. The simulation demonstrated that the sensitivity critically depends on the inner radius of the top electrode, the outer radius of the membrane, and the thickness of the piezoelectric film in the microphone. The optimized piezoelectric transducer structure showed a much higher sensitivity than that of the conventional piezoelectric transducer structure. This study provides a visible path to realize micro-scale high-sensitivity piezoelectric microphones that have a simple manufacturing process, wide range of frequency and low DC bias voltage.

Quality Assessment and Predistortion Evaluation of the Multi-channel Audio Codec according to the bitrate changing (압축율 변화에 따른 멀티채널 오디오의 품질 및 Predistortion 의 영향 평가)

  • Cha, Kyung-Hwan;Jang, Dae-Young;Kim, Sung-Han;Kim, Chun-Duck
    • The Journal of the Acoustical Society of Korea
    • /
    • v.15 no.2
    • /
    • pp.55-60
    • /
    • 1996
  • This paper describes the subjective assessment of the multi-channel audio quality according to the bitrate changing and evaluates the predistortion effect to avoid the unmasked noise after matrixing/dematrxing process in transmission and regeneration of the multi-channel audio. The simulation is processed by the perceptual coding that is MPEG-2 Audio layer II algorithm. We evaluate the quality improvement about predistortion using or not by 384, 320, 256, 128kbps. As the result of the double blind subjective assessment, 5 Grade-Impairment Scale is scored under minus one to 320kbps and so audio quality is evaluated to be perceptible, but not annoying in 3/2 channel. The effect of the predistortion is improved one level in 128kbps and especially speech test material I better improved than music test materials.

  • PDF

An adaptive time-delay recurrent neural network for temporal learning and prediction (시계열패턴의 학습과 예측을 위한 적응 시간지연 회귀 신경회로망)

  • 김성식
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.21 no.2
    • /
    • pp.534-540
    • /
    • 1996
  • This paper presents an Adaptive Time-Delay Recurrent Neural Network (ATRN) for learning and recognition of temporal correlations of temporal patterns. The ATRN employs adaptive time-delays and recurrent connections, which are inspired from neurobiology. In the ATRN, the adaptive time-delays make the ATRN choose the optimal values of time-delays for the temporal location of the important information in the input parrerns, and the recurrent connections enable the network to encode and integrate temporal information of sequences which have arbitrary interval time and arbitrary length of temporal context. The ATRN described in this paper, ATNN proposed by Lin, and TDNN introduced by Waibel were simulated and applied to the chaotic time series preditcion of Mackey-Glass delay-differential equation. The simulation results show that the normalized mean square error (NMSE) of ATRN is 0.0026, while the NMSE values of ATNN and TDNN are 0.014, 0.0117, respectively, and in temporal learning, employing recurrent links in the network is more effective than putting multiple time-delays into the neurons. The best performance is attained bythe ATRN. This ATRN will be sell applicable for temporally continuous domains, such as speech recognition, moving object recognition, motor control, and time-series prediction.

  • PDF

On the Use of a Parallel-Branch Subunit Mod디 in Continuous HMM for improved Word Recognition (연속분포 HMM에서 평행분기 음성단위를 사용한 단어인식율 향상연구)

  • Park, Yong-Kyuo;Un, Chong-Kwan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.14 no.2E
    • /
    • pp.25-32
    • /
    • 1995
  • In this paper, we propose to use a parallel-branch subunit model for improved word recognition. The model is obtained by splitting off each subunit branch based on mixture component in continuous hidden Markov model(continuous HMM). According to simulation results, the proposed model yields higher recognition rate than the single-branch subunit model or the parallel-branch subunit model proposed by Rabiner et al[1]. We show that a proper combination of the number of mixture components and the number of branches for each subunit results in increased recognition rate. To study the recognition performance of the proposed algorithms, the speech material used in this work was a vocabulary with 1036 Korean words.

  • PDF

Signal Subspace-based Voice Activity Detection Using Generalized Gaussian Distribution (일반화된 가우시안 분포를 이용한 신호 준공간 기반의 음성검출기법)

  • Um, Yong-Sub;Chang, Joon-Hyuk;Kim, Dong Kook
    • The Journal of the Acoustical Society of Korea
    • /
    • v.32 no.2
    • /
    • pp.131-137
    • /
    • 2013
  • In this paper we propose an improved voice activity detection (VAD) algorithm using statistical models in the signal subspace domain. A uncorrelated signal subspace is generated using embedded prewhitening technique and the statistical characteristics of the noisy speech and noise are investigated in this domain. According to the characteristics of the signals in the signal subspace, a new statistical VAD method using GGD (Generalized Gaussian Distribution) is proposed. Experimental results show that the proposed GGD-based approach outperforms the Gaussian-based signal subspace method at 0-15 dB SNR simulation conditions.

VR-simulated Sailor Training Platform for Emergency (긴급상황에 대한 가상현실 선원 훈련 플랫폼)

  • Park, Chur-Woong;Jung, Jinki;Yang, Hyun-Seung
    • Proceedings of the Korean Institute of Navigation and Port Research Conference
    • /
    • 2015.10a
    • /
    • pp.175-178
    • /
    • 2015
  • This paper presents a VR-simulated sailor training platform for emergency in order to prevent a human error that causes 60~80% of domestic/ abroad marine accidents. Through virtual reality technology, the proposed platform provides an interaction method for proficiency of procedures in emergency, and a crowd control method for controlling crowd agents in a virtual ship environment. The interaction method uses speech recognition and gesture recognition to enhance the immersiveness and efficiency of the training. The crowd control method provides natural simulations of crowd agents by applying a behavior model that reflects the social behavior model of human. To examine the efficiency of the proposed platform, a prototype whose virtual training scenario describes the outbreak of fire in a ship was implemented as a standalone system.

  • PDF