• Title/Summary/Keyword: 음성기반

Search Result 2,233, Processing Time 0.044 seconds

ETRI신기술-확장 합성단위 기반 한국어 음성합성기 기술

  • Electronics and Telecommunications Research Institute
    • Electronics and Telecommunications Trends
    • /
    • v.14 no.3 s.57
    • /
    • pp.127-128
    • /
    • 1999
  • 확장 합성단위 기반 한국어 음성합성장치는 통상의 문자로 쓰여진 텍스트를 인간이 소리내어 읽듯이 기계에 의해 자동적으로 음성을 합성하는 시스템이다. 이 시스템은 1995년부터 수행하고 있는 "다중 매체 환경 하에서의 대화체 음성번역 통신 기술개발" 사업의 연구 결과물 중 하나로 1997년도에 개발되어 학습형 자동합성단위 생성기 및 영역의존 음성합성기 기술을 전수할 예정이다.

  • PDF

A Study on EVRC-based Speech Enhancement by Reinforcement Learning (강화학습을 적용한 EVRC 기반의 음성향상기법에 대한 연구)

  • Kim, Sohyeon;Chang, Joon-Hyuk
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2018.05a
    • /
    • pp.340-341
    • /
    • 2018
  • 본 논문에서는 음성인식의 성능을 높이기 위해 잡음을 제거하여 음성을 향상시킬 목적으로 심화신경망 기반의 강화학습을 적용한 음성향상 기법을 제안한다. EVRC를 통해 잡음을 제거한 후 강화학습을 적용하여 성능을 비교하며 기존의 음성향상 기법보다 향상된 성능을 가지는 모델을 구현하고자 한다.

A Study on the Use of Speech Recognition Technology for Content-based Video Indexing and Retrieval (내용기반 비디오 색인 및 검색을 위한 음성인식기술 이용에 관한 연구)

  • 손종목;배건성;강경옥;김재곤
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.2
    • /
    • pp.16-20
    • /
    • 2001
  • An important aspect of video program indexing and retrieval is the ability to segment video program into meaningful segments, in other words, the ability of content-based video program segmentation. In this paper, a new approach using speech recognition technology has been proposed for content-based video program segmentation. This approach uses speech recognition technique to synchronize closed caption with speech signal. Experimental results demonstrate that the proposed scheme is very promising for content-based video program segmentation.

  • PDF

Korean speech recognition based on grapheme (문자소 기반의 한국어 음성인식)

  • Lee, Mun-hak;Chang, Joon-Hyuk
    • The Journal of the Acoustical Society of Korea
    • /
    • v.38 no.5
    • /
    • pp.601-606
    • /
    • 2019
  • This paper is a study on speech recognition in the Korean using grapheme unit (Cho-sumg [onset], Jung-sung [nucleus], Jong-sung [coda]). Here we make ASR (Automatic speech recognition) system without G2P (Grapheme to Phoneme) process and show that Deep learning based ASR systems can learn Korean pronunciation rules without G2P process. The proposed model is shown to reduce the word error rate in the presence of sufficient training data.

Korean Pause Prediction Model based on Dialogue Context (대화 맥락에 기반한 한국어 휴지 예측 모델)

  • Joung Lee;Jeongho Na;Jeongbeom Jeong;Maengsik Choi;Chunghee Lee;Seung-Hoon Na
    • Annual Conference on Human and Language Technology
    • /
    • 2023.10a
    • /
    • pp.404-408
    • /
    • 2023
  • 음성 사용자 인터페이스(Voice User Interface)에 대한 수요가 증가함에 따라 음성 합성(Speech Synthesis) 시스템에서 자연스러운 음성 발화를 모방하기 위해 적절한 위치에 휴지를 삽입하는 것이 주된 과업으로 자리잡았다. 대화의 연속성을 고려했을 때, 자연스러운 음성 기반 인터페이스를 구성하기 위해서는 대화의 맥락을 이해하고 적절한 위치에 휴지를 삽입하는 것이 필수적이다. 이에 따라 본 연구는 대화 맥락에 기반하여 적절한 위치에 휴지를 삽입하는 Long-Input Transformer 기반 휴지 예측 모델을 제안하고 한국어 대화 데이터셋에서 검증한 결과를 보인다.

  • PDF

Performance Analysis of Speech Recognition Model based on Neuromorphic Architecture of Speech Data Preprocessing Technique (음성 데이터 전처리 기법에 따른 뉴로모픽 아키텍처 기반 음성 인식 모델의 성능 분석)

  • Cho, Jinsung;Kim, Bongjae
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.22 no.3
    • /
    • pp.69-74
    • /
    • 2022
  • SNN (Spiking Neural Network) operating in neuromorphic architecture was created by mimicking human neural networks. Neuromorphic computing based on neuromorphic architecture requires relatively lower power than typical deep learning techniques based on GPUs. For this reason, research to support various artificial intelligence models using neuromorphic architecture is actively taking place. This paper conducted a performance analysis of the speech recognition model based on neuromorphic architecture according to the speech data preprocessing technique. As a result of the experiment, it showed up to 84% of speech recognition accuracy performance when preprocessing speech data using the Fourier transform. Therefore, it was confirmed that the speech recognition service based on the neuromorphic architecture can be effectively utilized.

A Statistical Model-Based Voice Activity Detection Employing the Conditional MAP Criterion with Spectral Deviation (조건 사후 최대 확률과 음성 스펙트럼 변이 조건을 이용한 통계적 모델 기반의 음성 검출기)

  • Kim, Sang-Kyun;Chang, Joon-Hyuk
    • The Journal of the Acoustical Society of Korea
    • /
    • v.30 no.6
    • /
    • pp.324-329
    • /
    • 2011
  • In this paper, we propose a novel approach to improve the performance of a statistical model-based voice activity detection (VAD) which is based on the conditional maximum a posteriori (CMAP) with deviation. In our approach, the VAD decision rule is expressed as the geometric mean of likelihood ratios (LRs) based on adapted threshold according to the speech presence probability conditioned on both the speech activity decisions and spectral deviation in the pervious frame. Experimental results show that the proposed approach yields better results compared to the CMAP-based VAD using the LR test.

Open API-based Conversational Voice Interaction Scheme for Intelligent IoT Applications for the Digital Underprivileged (디지털 소외계층을 위한 지능형 IoT 애플리케이션의 공개 API 기반 대화형 음성 상호작용 기법)

  • Joonhyouk, Jang
    • Smart Media Journal
    • /
    • v.11 no.10
    • /
    • pp.22-29
    • /
    • 2022
  • Voice interactions are particularly effective in applications targeting the digital underprivileged who are not proficient in the use of smart devices. However, applications based on open APIs are using voice signals only for short, fragmentary input and output due to the limitations of existing touchscreen-oriented UI and API provided. In this paper, we design a conversational voice interaction model for interactions between users and intelligent mobile/IoT applications and propose a keyword detection algorithm based on the edit distance. The proposed model and scheme were implemented in an Android environment, and the edit distance-based keyword detection algorithm showed a higher recognition rate than the existing algorithm for keywords that were incorrectly recognized through speech recognition.

Speech Estimators Based on Generalized Gamma Distribution and Spectral Gain Floor Applied to an Automatic Speech Recognition (잡음에 강인한 음성인식을 위한 Generalized Gamma 분포기반과 Spectral Gain Floor를 결합한 음성향상기법)

  • Kim, Hyoung-Gook;Shin, Dong;Lee, Jin-Ho
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.8 no.3
    • /
    • pp.64-70
    • /
    • 2009
  • This paper presents a speech enhancement technique based on generalized Gamma distribution in order to obtain robust speech recognition performance. For robust speech enhancement, the noise estimation based on a spectral noise floor controled recursive averaging spectral values is applied to speech estimation under the generalized Gamma distribution and spectral gain floor. The proposed speech enhancement technique is based on spectral component, spectral amplitude, and log spectral amplitude. The performance of three different methods is measured by recognition accuracy of automatic speech recognition (ASR).

  • PDF

Voice-based Control System Using Standard-based IoT Platforms (표준 사물인터넷 플랫폼을 활용한 음성 제어 시스템)

  • Jeong, Isu;Baek, Seungwoo;Lee, Sungchan;Yun, Jaeseok
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2019.01a
    • /
    • pp.454-455
    • /
    • 2019
  • 본 논문에서는 표준 기반 사물인터넷 (IoT: Internet of Things) 플랫폼을 활용한 음성 제어 시스템을 구현하고 그 성능을 검증한다. 사물인터넷 산업 표준인 원엠투엠 (oneM2M) 오픈 소스 플랫폼을 활용하여 음성으로 댁내 기기를 제어할 수 있는 프로토타입 시스템을 구현하였다. 음성 기반 제어를 위해 구글의 Speech-to-Text API를 활용하고 오픈 소스 하드웨어에 원엠투엠 플랫폼을 탑재하여 어디서든지 서버 플랫폼에 연결된 댁내 가전기기들을 제어할 수 있음을 보였다. 본 논문에서 구현한 시스템을 통해 표준화된 오픈 소스 플랫폼과 클라우드 음성 인식 API를 활용하여 확장성과 연결성을 갖춘 커넥티드 홈을 구현할 수 있음을 알 수 있다.

  • PDF