Search | Korea Science

Adaptive Noise Reduction of Speech Using Wavelet Transform (웨이브렛 변환을 이용한 음성의 적응 잡음 제거)

Lee, Chang-Ki;Kim, Dae-Ik
- The Journal of the Korea institute of electronic communication sciences
- /
- v.4 no.3
- /
- pp.190-196
- /
- 2009
A new time adapted threshold using the standard deviations of Wavelet coefficients after Wavelet transform by frame scale is proposed. The time adapted threshold is set up using the sum of standard deviations of Wavelet coefficient in level 3 approximation and weighted level 1 detail. Level 3 approximation coefficients represent the voiced sound with low frequency and level 1 detail coefficients represent the unvoiced sound with high frequency. After reducing noise by soft thresholding with the proposed time adapted threshold, there are still residual noises in silent interval. To reduce residual noises in silent interval, a detection algorithm of silent interval is proposed. From simulation results, it can be noticed that SNR and MSE of the proposed algorithm are improved than those of Wavelet transform and than those of Wavelet packet transform.
PDF

Unsupervised Word Grouping Algorithm for real-time implementation of Medium vocabulary recognition (중규모급 단어 인식기의 실시간 구현을 위한 무감독 단어집단화 알고리듬)

Lim Dong Sik;Kim Jin Young;Baek Seong Joon
- Proceedings of the Acoustical Society of Korea Conference
- /
- autumn
- /
- pp.81-84
- /
- 1999
본 논문에서는 중규모급 단어인식기의 실시간 구현을 위한 무감독 단어집단화 알고리듬을 제안한다. 무감독 단어집단화는 인식대상 어휘 수가 많은 대용량 음성인식 시스템에서 대상 어휘 수를 줄여주는 역할을 하는 전처리기의 성격을 갖는다. 무감독 집단화를 위해 각 단어의 유$\cdot$무성음 고유의 특성을 잘 반영할 수 있는 특징 파라미터 5개를 사용하여 패턴 인식과 회귀분석에서 널리 사용되고 있는 분류$\cdot$회귀트리(Classification And Regression Tree)에 적용시키는 방법으로 접근하였고, 각 단어의 frame 수를 일정하게 n개로 분할(segment)하여 1개의 tree를 생성시키는 방법과 각 segment에 해당하는 tree를 생성시켜 segment들 사이의 교집합 성분으로 단어들을 집단화 하였다 실험결과 탐색 대상단어 22개에서 평균2.21개로 줄어 전체 대상 단어의 $10\%$만을 탐색하여 인식할 수 있는 방법을 제시할 수 있었다.
PDF

Subband Based Spectrum Subtraction Algorithm (서브밴드에 기반한 스펙트럼 차감 알고리즘)

Choi, Jae-Seung
- The Journal of the Korea institute of electronic communication sciences
- /
- v.8 no.4
- /
- pp.555-560
- /
- 2013
This paper first proposes a classification algorithm which detects a voiced, unvoiced, and silence signal using distance measure, logarithm power and root mean square methods at each frame, then a spectrum subtraction algorithm based on a subband filter. The proposed algorithm subtracts spectrums of white noise and street noise from noisy signal based on the subband filter at each frame. In this experiment, experimental results of the proposed spectrum subtraction algorithm demonstrate using the speech and noise data of Aurora-2 database. Based on measuring the speech-to-noise ratio (SNR), experiments confirm that the proposed algorithm is effective for the speech by contaminated the noise. From the experiments, the improvement in the output SNR values was approximately 2.1 dB and 1.91 dB better for white noise and street noise, respectively.
https://doi.org/10.13067/JKIECS.2013.8.4.555 인용 PDF KSCI

Adaptive Noise Reduction of Speech using Wavelet Transform (웨이브렛 변환을 이용한 음성의 적응 잡음 제거)

Im Hyung-kyu;Kim Cheol-su
- Journal of the Korea Computer Industry Society
- /
- v.6 no.2
- /
- pp.271-278
- /
- 2005
This paper proposed a new time adapted threshold using the standard deviations of Wavelet coefficients after Wavelet transform by frame scale. The time adapted threshold is set up using the sum of standard deviations of Wavelet coefficient in level 3 approximation and weighted level 1 detail. Level 3 approximation coefficients represent the voiced sound with low frequency and level 1 detail coefficients represent the unvoiced sound with high frequency. After reducing noise by soft thresholding with the proposed time adapted threshold, there are still residual noises in silent interval. To reduce residual noises in silent interval, a detection algorithm of silent interval is proposed. From simulation results, it is demonstrated that the proposed algorithm improves SNR and MSE performance more than Wavelet transform and Wavelet packet transform does.
PDF

Segmentation of continuous Korean Speech Based on Boundaries of Voiced and Unvoiced Sounds (유성음과 무성음의 경계를 이용한 연속 음성의 세그먼테이션)

Yu, Gang-Ju;Sin, Uk-Geun
- The Transactions of the Korea Information Processing Society
- /
- v.7 no.7
- /
- pp.2246-2253
- /
- 2000
In this paper, we show that one can enhance the performance of blind segmentation of phoneme boundaries by adopting the knowledge of Korean syllabic structure and the regions of voiced/unvoiced sounds. eh proposed method consists of three processes : the process to extract candidate phoneme boundaries, the process to detect boundaries of voiced/unvoiced sounds, and the process to select final phoneme boundaries. The candidate phoneme boudaries are extracted by clustering method based on similarity between two adjacent clusters. The employed similarity measure in this a process is the ratio of the probability density of adjacent clusters. To detect he boundaries of voiced/unvoiced sounds, we first compute the power density spectrum of speech signal in 0∼400 Hz frequency band. Then the points where this paper density spectrum variation is greater than the threshold are chosen as the boundaries of voiced/unvoiced sounds. The final phoneme boundaries consist of all the candidate phoneme boundaries in voiced region and limited number of candidate phoneme boundaries in unvoiced region. The experimental result showed about 40% decrease of insertion rate compared to the blind segmentation method we adopted.
PDF

An Adaptive Speech Enhancement System Using Lateral Inhibition and Time-Delay Neural Network (상호억제와 시간지연 신경회로망을 사용한 적응적인 음성강조시스템)

Choi, Jae-Seung
- Journal of the Institute of Electronics Engineers of Korea SP
- /
- v.45 no.2
- /
- pp.95-102
- /
- 2008
This paper proposes an adaptive speech enhancement system based on an auditory system to enhance speech that is degraded by various background noises. As such, the proposed system detects voiced and unvoiced sections, adaptively adjusts the coefficients for both the lateral inhibition and the amplitude component according to the detected sections for each input fame, then reduces the noise signal using a time-delay neural network. Based on measuring the signal-to-noise ratio, experiments confirm that the proposed system is effective for speech degraded by various noises.
PDF KSCI

The Speaker Identification Using Incremental Learning (Incremental Learning을 이용한 화자 인식)

Sim, Kwee-Bo;Heo, Kwang-Seung;Park, Chang-Hyun;Lee, Dong-Wook
- Journal of the Korean Institute of Intelligent Systems
- /
- v.13 no.5
- /
- pp.576-581
- /
- 2003
Speech signal has the features of speakers. In this paper, we propose the speaker identification system which use the incremental learning based on neural network. Recorded speech signal through the Mic is passed the end detection and is divided voiced signal and unvoiced signal. The extracted 12 order cpestrum are used the input data for neural network. Incremental learning is the learning algorithm that the learned weights are remembered and only the new weights, that is created as adding new speaker, are trained. The architecture of neural network is extended with the number of speakers. So, this system can learn without the restricted number of speakers.
https://doi.org/10.5391/JKIIS.2003.13.5.576 인용 PDF KSCI

A Study on the Acoustic Characteristics Parameter of Resonance Cavity and Phonation in Liver Diseases (간 질환이 공명강과 발성에 미치는 음성분석학적 특징 요소 연구)

Lim, Soon-Yong;Lim, Sung-Su;Youn, Yong-Heum;Min, Ji-Sun;Song, Han-Sol;Kim, Bong-Hyun;Ka, Min-Kyoung;Cho, Dong-Uk
- Proceedings of the Korea Information Processing Society Conference
- /
- 2011.04a
- /
- pp.1093-1096
- /
- 2011
현대 의료 분야는 질병의 진단과 치료뿐만 아니라 질병의 예방 및 건강증진을 위한 관리, 유지의 역할도 중요하게 대두되고 있다. 즉, 질병의 조기 발견과 진단으로 예방 및 관리를 생활화하고 건강수준을 높이는 방향을 제시하는 등 건강증진을 유도하는 계기를 증대시키고 있다. 따라서 본 논문에서는 간질환이 음성에 미치는 영향을 연구하기 위해 간 질환자를 대상으로 공명강과 발성의 변화를 측정하는 실험을 수행하였다. 이를 위해 간 질환자를 피실험자 집단으로 구성하여 간질환으로 인해 입원했을 때와 치료 후에 퇴원했을 때의 음성을 각각 수집하여 음성 분석 요소 중 제3포먼트 주파수 대역폭과 무성음 추출 패턴수를 측정하여 간 질환으로 인해 공명강과 발성에 미치는 영향을 분석하는 연구를 수행하였다.
https://doi.org/10.3745/PKIPS.y2011m04a.1093 인용 PDF

A Study on ACFBD-MPC in 8kbps (8kbps에 있어서 ACFBD-MPC에 관한 연구)

Lee, See-Woo
- Journal of the Korea Academia-Industrial cooperation Society
- /
- v.17 no.7
- /
- pp.49-53
- /
- 2016
Recently, the use of signal compression methods to improve the efficiency of wireless networks have increased. In particular, the MPC system was used in the pitch extraction method and the excitation source of voiced and unvoiced to reduce the bit rate. In general, the MPC system using an excitation source of voiced and unvoiced would result in a distortion of the synthesis speech waveform in the case of voiced and unvoiced consonants in a frame. This is caused by normalization of the synthesis speech waveform in the process of restoring the multi-pulses of the representation segment. This paper presents an ACFBD-MPC (Amplitude Compensation Frequency Band Division-Multi Pulse Coding) using amplitude compensation in a multi-pulses each pitch interval and specific frequency to reduce the distortion of the synthesis speech waveform. The experiments were performed with 16 sentences of male and female voices. The voice signal was A/D converted to 10kHz 12bit. In addition, the ACFBD-MPC system was realized and the SNR of the ACFBD-MPC estimated in the coding condition of 8kbps. As a result, the SNR of ACFBD-MPC was 13.6dB for the female voice and 14.2dB for the male voice. The ACFBD-MPC improved the male and female voice by 1 dB and 0.9 dB, respectively, compared to the traditional MPC. This method is expected to be used for cellular telephones and smartphones using the excitation source with a low bit rate.
https://doi.org/10.5762/KAIS.2016.17.7.49 인용 PDF KSCI

A study on the voiceless plosives from the English and Korean spontaneous speech corpus (영어와 한국어 자연발화 음성 코퍼스에서의 무성 파열음 연구)

Yoon, Kyuchul
- Phonetics and Speech Sciences
- /
- v.11 no.4
- /
- pp.45-53
- /
- 2019
The purpose of this work was to examine the factors affecting the identities of the voiceless plosives, i.e. English [p, t, k] and Korean [p^h, t^h, k^h], from the spontaneous speech corpora. The factors were automatically extracted by a Praat script and the percent correctness of the discriminant analyses was incrementally assessed by increasing the number of factors used in predicting the identities of the plosives. The factors included the spectral moments and tilts of the plosive release bursts, the post-burst aspirations and the vowel onsets, the durations such as the closure durations and the voice onset times (VOTs), the locations within words and utterances and the identities of the following vowels. The results showed that as the number of factors increased up to five, so did the percent correctness of the analyses, resulting in 74.6% for English and 66.4% for Korean. However, the optimal number of factors for the maximum percent correctness was four, i.e. the spectral moments and tilts of the release bursts and the following vowels, the closure durations and the VOTs. This suggests that the identities of the voiceless plosives are mostly determined by their internal and vowel onset cues.
https://doi.org/10.13064/KSSS.2019.11.4.045 인용 PDF KSCI

Search Result 122, Processing Time 0.027 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)