Search | Korea Science

Sample selection approach using moving window for acoustic analysis of pathological sustained vowels according to signal typing

Lee, Ji-Yeoun
- Phonetics and Speech Sciences
- /
- v.3 no.3
- /
- pp.99-108
- /
- 2011
The perturbation parameters like jitter, shimmer, and signal-to-noise ratio (SNR) are largely estimated in the particular segment from the subjective or whole portion of the given pathological voice signal although there are many possible regions to be able to analyze the voice signals. In this paper, the pathological voice signals were classified as type 1, 2, 3, or 4 according to narrow band spectrogram and the value differences of the perturbation parameters extracted in the subjective and entire portion tended to be getting bigger as from type 1 to type 4 signals. Therefore, sample selection method based on moving window to analyze type 2 and 3 signals as well as type 1 signals is proposed. Although type 3 signals cannot be analyzed using the perturbation analysis, the type 3 signals by selecting out the samples in which error count is less than 10 through moving window were analyzed. At present, there is no method to be able to analyze the type 4 signals. Future research will endeavor to determine the best way to evaluate such voices.
PDF

Study on the Improvement of Speech Recognizer by Using Time Scale Modification (시간축 변환을 이용한 음성 인식기의 성능 향상에 관한 연구)

이기승
- The Journal of the Acoustical Society of Korea
- /
- v.23 no.6
- /
- pp.462-472
- /
- 2004
In this paper a method for compensating for thp performance degradation or automatic speech recognition (ASR) is proposed. which is mainly caused by speaking rate variation. Before the new method is proposed. quantitative analysis of the performance of an HMM-based ASR system according to speaking rate is first performed. From this analysis, significant performance degradation was often observed in the rapidly speaking speech signals. A quantitative measure is then introduced, which is able to represent speaking rate. Time scale modification (TSM) is employed to compensate the speaking rate difference between input speech signals and training speech signals. Finally, a method for compensating the performance degradation caused by speaking rate variation is proposed, in which TSM is selectively employed according to speaking rate. By the results from the ASR experiments devised for the 10-digits mobile phone number, it is confirmed that the error rate was reduced by 15.5% when the proposed method is applied to the high speaking rate speech signals.
PDF KSCI

Analysis of Speech Signals by linear prediction and It's Application (선형 예측법에 의한 음성신호의 분석과 그 응용 방안)

김명규
- Journal of the Korean Institute of Telematics and Electronics
- /
- v.18 no.4
- /
- pp.27-33
- /
- 1981
In this paper, the effect of tone variation of speech signals is discussedty showing the variations of the linear prediction model spectra and the estimated vocal tract shape for Korean vowels. As an application of the analysis results a speech spenthesis scheme by combination of phonemes is also discussed based on experimental results.
PDF

Tree Coding of Speech Signals (음성신호에 대한 트리 코우딩)

김경수;이상욱
- Proceedings of the Korean Institute of Communication Sciences Conference
- /
- 1984.04a
- /
- pp.18-21
- /
- 1984
In this paper, the tree coding using the (M, L) multi-path search algorithm has teen investigated. A hybrid adaptation scheme which employs a block adaptation as well as a sequential dadptation is described for application in quantization and compression of speech signals. Simulation results with the gybrid adaptation scheme indicate that a relatively good speech quality can be obtained at rate about 8Kbps. All necessary parameters such as MlL and filter-order were found from simulation and these parameters turned out to be a good compromise between the complexity and overall performance.
PDF

Sub-Nyquist Nonuniform Sampling and Perfect Reconstruction of Speech Signals (음성신호의 Sub-Nyquist 비균일 표준화 및 완전 복구에 관한 연구)

Lee, He-Young
- Speech Sciences
- /
- v.12 no.2
- /
- pp.153-170
- /
- 2005
The sub-Nyquist nonuniform sampling (SNNS) and the perfect reconstruction (PR) formula are proposed for the development of a systematic method to obtain minimal representation of a speech signal. In the proposed method, the instantaneous sampling frequency (ISF) varies, depending on the least upper boundary of spectral support of a speech signal in time-frequency domain (TFD). The definition of the instantaneous bandwidth (IB), which determines the ISF and is used for generating the set of samples that represent continuous-time signals perfectly, is given. Also, the spectral characteristics of the sampled data generated by the sub-Nyquist nonuniform sampling method is analyzed. The proposed method doesn't generate the redundant samples due to the time-varying property of the instantaneous bandwidth of a speech signal.
PDF

Robust speech recognition in car environment with echo canceller (반향제거기를 갖는 자동차 실내 환경에서의 음성인식)

Park, Chul-Ho;Heo, Won-Chul;Bae, Keun-Sung
- Proceedings of the KSPS conference
- /
- 2005.11a
- /
- pp.147-150
- /
- 2005
The performance of speech recognition in car environment is severely degraded when there is music or news coming from a radio or a CD player. Since reference signals are available from the audio unit in the car, it is possible to remove them with an adaptive filter. In this paper, we present experimental results of speech recognition in car environment using the echo canceller. For this, we generate test speech signals by adding music or news to the car noisy speech from Aurora2 DB. The HTK-based continuous HMT system is constructed for a recognition system. In addition, the MMSE-STSA method is used to the output of the echo canceller to remove the residual noise more.
PDF

Automatic speech recognition using acoustic doppler signal (초음파 도플러를 이용한 음성 인식)

Lee, Ki-Seung
- The Journal of the Acoustical Society of Korea
- /
- v.35 no.1
- /
- pp.74-82
- /
- 2016
In this paper, a new automatic speech recognition (ASR) was proposed where ultrasonic doppler signals were used, instead of conventional speech signals. The proposed method has the advantages over the conventional speech/non-speech-based ASR including robustness against acoustic noises and user comfortability associated with usage of the non-contact sensor. In the method proposed herein, 40 kHz ultrasonic signal was radiated toward to the mouth and the reflected ultrasonic signals were then received. Frequency shift caused by the doppler effects was used to implement ASR. The proposed method employed multi-channel ultrasonic signals acquired from the various locations, which is different from the previous method where single channel ultrasonic signal was employed. The PCA(Principal Component Analysis) coefficients were used as the features of ASR in which hidden markov model (HMM) with left-right model was adopted. To verify the feasibility of the proposed ASR, the speech recognition experiment was carried out the 60 Korean isolated words obtained from the six speakers. Moreover, the experiment results showed that the overall word recognition rates were comparable with the conventional speech-based ASR methods and the performance of the proposed method was superior to the conventional signal channel ASR method. Especially, the average recognition rate of 90 % was maintained under the noise environments.
https://doi.org/10.7776/ASK.2016.35.1.074 인용 PDF KSCI

A Study on the Reconstruction of a Frame Based Speech Signal through Dictionary Learning and Adaptive Compressed Sensing (Adaptive Compressed Sensing과 Dictionary Learning을 이용한 프레임 기반 음성신호의 복원에 대한 연구)

Jeong, Seongmoon;Lim, Dongmin
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.37A no.12
- /
- pp.1122-1132
- /
- 2012
Compressed sensing has been applied to many fields such as images, speech signals, radars, etc. It has been mainly applied to stationary signals, and reconstruction error could grow as compression ratios are increased by decreasing measurements. To resolve the problem, speech signals are divided into frames and processed in parallel. The frames are made sparse by dictionary learning, and adaptive compressed sensing is applied which designs the compressed sensing reconstruction matrix adaptively by using the difference between the sparse coefficient vector and its reconstruction. Through the proposed method, we could see that fast and accurate reconstruction of non-stationary signals is possible with compressed sensing.
https://doi.org/10.7840/kics.2012.37A.12.1122 인용 PDF KSCI

Speech Enhancement Using Nonnegative Matrix Factorization with Temporal Continuity (시간 연속성을 갖는 비음수 행렬 분해를 이용한 음질 개선)

Nam, Seung-Hyon
- The Journal of the Acoustical Society of Korea
- /
- v.34 no.3
- /
- pp.240-246
- /
- 2015
In this paper, speech enhancement using nonnegative matrix factorization with temporal continuity has been addressed. Speech and noise signals are modeled as Possion distributions, and basis vectors and gain vectors of NMF are modeled as Gamma distributions. Temporal continuity of the gain vector is known to be critical to the quality of enhanced speech signals. In this paper, temporal continiuty is implemented by adopting Gamma-Markov chain priors for noise gain vectors during the separation phase. Simulation results show that the Gamma-Markov chain models temporal continuity of noise signals and track changes in noise effectively.
https://doi.org/10.7776/ASK.2015.34.3.240 인용 PDF KSCI

Classification of Sasang Constitution Taeumin by Comparative of Speech Signals Analysis (음성 분석 정보값 비교를 통한 사상체질 태음인의 분류)

Kim, Bong-Hyun;Lee, Se-Hwan;Cho, Dong-Uk
- The KIPS Transactions:PartB
- /
- v.15B no.1
- /
- pp.17-24
- /
- 2008
This paper proposes Sasang constitution classification through speech signals analysis values and comparison. For this, this paper wishes to propose Taeumin classification method of output values signals that comes out speech signal analysis to connect with process classification of Soeumin through skin diagnosis by first step in the whole system configuration to provide for objective index of Sasang constitution. First of all, these characteristic of voices wish to extract phonetic elements that each Sasang constitution groups' clear features. Also, we wish to classify Taeumin through constitution groups' difference and similarity on the basis of results value. Finally, the effectiveness of this method is verified through the experiments.
https://doi.org/10.3745/KIPSTB.2008.15-B.1.17 인용 PDF KSCI

Search Result 499, Processing Time 0.024 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)