• Title/Summary/Keyword: Speech Processing

Search Result 956, Processing Time 0.025 seconds

Noise Reduction for Korean Connected Digit Recognition through Telephone Channel (전화망 환경에서 한국어 숫자음 인식을 위한 잡음처리)

  • Kim Kyuhong;Kim Hoirin
    • Proceedings of the KSPS conference
    • /
    • 2003.05a
    • /
    • pp.211-214
    • /
    • 2003
  • 일반적으로 음성 인식에서의 성능은 잡음의 영향으로 인하여 저하된다. 전화망을 통한 한국어 연속 숫자음 인식은 음성인식 분야에 있어서 어려운 영역에 속하는데, 이는 조음 현상으로 인한 인식률 저하되는 점과 전화망 채널의 영향으로 인하여 스펙트럼 포락이 왜곡되며 음성신호의 대역폭이 제한되기 때문이다. 본 논문에서는 잡음의 영향을 줄이기 위하여, 2WF(2-stage Wiener Filter) 와 SWP (SNR-dependent Waveform Processing) 그리고 CMN(Cepstrum Mean Normalization)을 사용하였다. 2WF는 음성 신호의 포만트 구조를 적게 왜곡시키면서 전체적인 가산잡음 뿐만 아니라 동적 가산잡음도 줄여준다. SWP는 음성파형에서 SNR값이 상대적으로 큰 부분을 강조하여 전체적인 SNR을 향상시킬 수 있다. 또한, CMN은 특징벡터로부터 채널잡음의 영향을 정규화하여 음성 인식 성능을 향상시킨다. 이러한 방법들을 전화망 한국어 연속 숫자음 DB를 이용하여 실험한 결과, 음성신호의 왜곡을 최소화하면서 잡음의 영향을 줄여 전화망에서의 숫자음 인식 성능을 향상시킬 수 있었다.

  • PDF

Improvement of Prosody Transplantation Technology for English Prosody Education and Its Application (운율교육을 위한 운율이식기술 개선 방안 연구)

  • Yi, So-Pae
    • MALSORI
    • /
    • no.61
    • /
    • pp.49-62
    • /
    • 2007
  • This study focused on the improvement of prosody transplantation technology to be used for effective prosody education. Issues making the technology a less acceptable tool for prosody education were addressed. Instead of merely copying the target pitch onto a learner's utterances, the target pitch was resealed in semitone before the transplantation. In so doing, distortion of a signal was minimized and the transplanted utterance could have the quality of sound not different from the learner's utterances. Instead of manual transplantation, an automatic procedure was proposed to increase the reliability and the consistency of the outcome and enable real time processing. The perceptual performance of the automatic transplantation was evaluated by the perception experiment showing the automatic ransplantation was as good as the manual process.

  • PDF

A study on the design of new floating resistor and it′s application (새로운 CMOS Floating저항의 설계와 그 응용에 대한연구)

  • 이영훈
    • Journal of the Korea Society of Computer and Information
    • /
    • v.5 no.3
    • /
    • pp.76-83
    • /
    • 2000
  • The continuous time signal system by development of CMOS technology have been receiving consideration attention. In this paper, Low pass filter using new CMOS floating resistor have been designed with cut off frequency for speech signal Processing. Especially a new floating resistor consisting entirely of CMOS devices in saturation has been developed. Linearity within $\pm$0.04% is achieved through nonlinearity via current mirrors over an applied range of $\pm$1V The frequency response exceeds 10MHz, and the resistors are expected to be useful in implementing integrated circuit active RC filters. The low pass filter designed using this method has simpler structure than switched capacitofilter. So reduce the chip area. The characteristics of the designed low pass filter using this method are simulated by pspice program.

  • PDF

Out-Of-Domain Detection Using Hierarchical Dirichlet Process

  • Jeong, Young-Seob
    • Journal of the Korea Society of Computer and Information
    • /
    • v.23 no.1
    • /
    • pp.17-24
    • /
    • 2018
  • With improvement of speech recognition and natural language processing, dialog systems are recently adapted to various service domains. It became possible to get desirable services by conversation through the dialog system, but it is still necessary to improve separate modules, such as domain detection, intention detection, named entity recognition, and out-of-domain detection, in order to achieve stable service offer. When it misclassifies an in-domain sentence of conversation as out-of-domain, it will result in poor customer satisfaction and finally lost business. As there have been relatively small number of studies related to the out-of-domain detection, in this paper, we introduce a new method using a hierarchical Dirichlet process and demonstrate the effectiveness of it by experimental results on Korean dataset.

A Study of the Trend of Deep Learning Technology of China (중국의 딥러닝 기술 동향에 관한 연구)

  • Fu, Yumei;Kim, Minyoung;Park, Geunho;Jang, Jongwook
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2019.05a
    • /
    • pp.385-388
    • /
    • 2019
  • In recent years, China has faced unprecedented intelligent reforms. Artificial intelligence has become a hot topic in society. The deep learning framework is the core of artificial intelligence industrialization, and it has also attracted the attention of all parties. Among them, deep learning has been applied in the fields of computer vision, speech recognition, and language technology processing. This paper will introduce China's development status and future challenges in technology, talent, and market applications.

  • PDF

Subtitle Automatic Generation System using Speech to Text (음성인식을 이용한 자막 자동생성 시스템)

  • Son, Won-Seob;Kim, Eung-Kon
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.16 no.1
    • /
    • pp.81-88
    • /
    • 2021
  • Recently, many videos such as online lecture videos caused by COVID-19 have been generated. However, due to the limitation of working hours and lack of cost, they are only a part of the videos with subtitles. It is emerging as an obstructive factor in the acquisition of information by deaf. In this paper, we try to develop a system that automatically generates subtitles using voice recognition and generates subtitles by separating sentences using the ending and time to reduce the time and labor required for subtitle generation.

Artificial Intelligence Applications on Mobile Telecommunication Systems (AI의 이동통신시스템 적용)

  • Yeh, C.I.;Chang, K.S.;Ko, Y.J.
    • Electronics and Telecommunications Trends
    • /
    • v.37 no.4
    • /
    • pp.60-69
    • /
    • 2022
  • So far, artificial intelligence (AI)/machine learning (ML) has produced impressive results in speech recognition, computer vision, and natural language processing. AI/ML has recently begun to show promise as a viable means for improving the performance of 5G mobile telecommunication systems. This paper investigates standardization activities in 3GPP and O-RAN Alliance regarding AI/ML applications on mobile telecommunication system. Future trends in AI/ML technologies are also summarized. As an overarching technology in 6G, there appears to be no doubt that AI/ML could contribute to every part of mobile systems, including core, RAN, and air-interface, in terms of performance enhancement, automation, cost reduction, and energy consumption reduction.

Variational autoencoder for prosody-based speaker recognition

  • Starlet Ben Alex;Leena Mary
    • ETRI Journal
    • /
    • v.45 no.4
    • /
    • pp.678-689
    • /
    • 2023
  • This paper describes a novel end-to-end deep generative model-based speaker recognition system using prosodic features. The usefulness of variational autoencoders (VAE) in learning the speaker-specific prosody representations for the speaker recognition task is examined herein for the first time. The speech signal is first automatically segmented into syllable-like units using vowel onset points (VOP) and energy valleys. Prosodic features, such as the dynamics of duration, energy, and fundamental frequency (F0), are then extracted at the syllable level and used to train/adapt a speaker-dependent VAE from a universal VAE. The initial comparative studies on VAEs and traditional autoencoders (AE) suggest that the former can efficiently learn speaker representations. Investigations on the impact of gender information in speaker recognition also point out that gender-dependent impostor banks lead to higher accuracies. Finally, the evaluation on the NIST SRE 2010 dataset demonstrates the usefulness of the proposed approach for speaker recognition.

Automatic Error Correction System for Erroneous SMS Strings (SMS 변형된 문자열의 자동 오류 교정 시스템)

  • Kang, Seung-Shik;Chang, Du-Seong
    • Journal of KIISE:Software and Applications
    • /
    • v.35 no.6
    • /
    • pp.386-391
    • /
    • 2008
  • Some spoken word errors that violate grammatical or writing rules occurs frequently in communication environments like mobile phone and messenger. These unexpected errors cause a problem in a language processing system for many applications like speech recognition, text-to-speech translation, and so on. In this paper, we proposed and implemented an automatic correction system of ill-formed words and word spacing errors in SMS sentences that has been the major errors of poor accuracy. We experimented three methods of constructing the word correction dictionary and evaluated the results of those methods. They are (1) manual construction of error words from the vocabulary list of ill-formed communication languages, (2) automatic construction of error dictionary from the manually constructed corpus, and (3) context-dependent method of automatic construction of error dictionary.

A Study on the Channel Normalized Pitch Synchronous Cepstrum for Speaker Recognition (채널에 강인한 화자 인식을 위한 채널 정규화 피치 동기 켑스트럼에 관한 연구)

  • 김유진;정재호
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.1
    • /
    • pp.61-74
    • /
    • 2004
  • In this paper, a contort- and speaker-dependent cepstrum extraction method and a channel normalization method for minimizing the loss of speaker characteristics in the cepstrum were proposed for a robust speaker recognition system over the channel. The proposed extraction method creates a cepstrum based on the pitch synchronous analysis using the inherent pitch of the speaker. Therefore, the cepstrum called the 〃pitch synchronous cepstrum〃 (PSC) represents the impulse response of the vocal tract more accurately in voiced speech. And the PSC can compensate for channel distortion because the pitch is more robust in a channel environment than the spectrum of speech. And the proposed channel normalization method, the 〃formant-broadened pitch synchronous CMS〃 (FBPSCMS), applies the Formant-Broadened CMS to the PSC and improves the accuracy of the intraframe processing. We compared the text-independent closed-set speaker identification on 56 females and 112 males using TIMIT and NTIMIT database, respectively. The results show that pitch synchronous km improves the error reduction rate by up to 7.7% in comparison with conventional short-time cepstrum and the error rates of the FBPSCMS are more stable and lower than those of pole-filtered CMS.