• Title/Summary/Keyword: Speech Recognition Technology

Search Result 530, Processing Time 0.033 seconds

A study on the algorithm for speech recognition (음성인식을 위한 알고리즘에 관한 연구)

  • Kim, Sun-Chul;Lee, Jung-Woo;Cho, Kyu-Ok;Park, Jae-Gyun;Oh, Yong Taek
    • Proceedings of the KIEE Conference
    • /
    • 2008.07a
    • /
    • pp.2255-2256
    • /
    • 2008
  • 음성인식 시스템을 설계함에 있어서는 대표적으로 사람의 성도 특성을 모방한 LPC(Linear Predict Cording)방식과 청각 특성을 고려한 MFCC(Mel-Frequency Cepstral Coefficients)방식이 있다. 본 논문에서는 MFCC를 통해 특징파라미터를 추출하고 해당 영역에서의 수행된 작업을 매틀랩 알고리즘을 이용하여 그래프로 시현하였다. MFCC 방식의 추출과정은 최초의 음성신호로부터 전처리과정을 통해 아날로그 신호를 디지털 신호로 변환하고, 잡음부분을 최소화하며, 음성 부분을 강조한다. 이 신호는 다시 Windowing을 통해 음성의 불연속을 제거해 주고, FFT를 통해 시간의 영역을 주파수의 영역으로 변환한다. 이 변환된 신호는 Filter Bank를 거쳐 다수의 복잡한 신호를 몇 개의 간단한 신호로 간소화 할 수 있으며, 마지막으로 Mel-cepstrum을 통해 최종적으로 특징 파라미터를 얻고자 하였다.

  • PDF

A Study on Smart Home Controller Utilizing Speech Recognition Technology (음성 인식 기술을 활용한 스마트 홈 컨트롤러)

  • Jung-Wook Moon;Dong-Min Seo;Yeon-Woo Heo;Kyung-Beom Lim
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.11a
    • /
    • pp.850-851
    • /
    • 2023
  • 스마트 홈 기술의 편리함과 생활 편의성에 대한 수요가 증가하고 있다. 음성인식 기술과 함께 다양한 하드웨어 기기들을 연결하여 스마트 홈 시스템을 구현하여 일상 생활에서 많은 불편함을 해결할 수 있다.

VR-simulated Sailor Training Platform for Emergency (긴급상황에 대한 가상현실 선원 훈련 플랫폼)

  • Park, Chur-Woong;Jung, Jinki;Yang, Hyun-Seung
    • Proceedings of the Korean Institute of Navigation and Port Research Conference
    • /
    • 2015.10a
    • /
    • pp.175-178
    • /
    • 2015
  • This paper presents a VR-simulated sailor training platform for emergency in order to prevent a human error that causes 60~80% of domestic/ abroad marine accidents. Through virtual reality technology, the proposed platform provides an interaction method for proficiency of procedures in emergency, and a crowd control method for controlling crowd agents in a virtual ship environment. The interaction method uses speech recognition and gesture recognition to enhance the immersiveness and efficiency of the training. The crowd control method provides natural simulations of crowd agents by applying a behavior model that reflects the social behavior model of human. To examine the efficiency of the proposed platform, a prototype whose virtual training scenario describes the outbreak of fire in a ship was implemented as a standalone system.

  • PDF

The Neighborhood Effect in Korean Visual Word Recognition (한국어 시각단어재인에서 나타나는 이웃효과)

  • Kwon, You-An;Cho, Hyae-Suk;Kim, Choong-Myung;Nam, Ki-Chun
    • MALSORI
    • /
    • no.60
    • /
    • pp.29-45
    • /
    • 2006
  • We investigated whether the first syllable plays an important role in lexical access in Korean visual word recognition. To do so, one lexical decision task (LDT) and two form primed LDT experiments examined the nature of the syllabic neighborhood effect. In Experiment 1, the syllabic neighborhood density and the syllabic neighborhood frequency was manipulated. The results showed that lexical decision latencies were only influenced by the syllabic neighborhood frequency. The purpose of experiment 2 was to confirm the results of experiment 1 with form-primed LDT task. The lexical decision latency was slower in form-related condition compared to form-unrelated condition. The effect of syllabic neighborhood density was significant only in form-related condition. This means that the first syllable plays an important role in the sub-lexical process. In Experiment 3, we conducted another form-primed LDT task manipulating the number of syllabic neighbors in words with higher frequency neighborhood. The interaction of syllabic neighborhood density and form relation was significant. This result confirmed that the words with higher frequency neighborhood are more inhibited by neighbors sharing the first syllable than words with no higher frequency neighborhood in the lexical level. These findings suggest that the first syllable is the unit of neighborhood and the unit of representation in sub-lexical representation is syllable in Korea.

  • PDF

Computerized English Pronunciation Testing

  • Lim, Chang-Keun;Kang, Seung-Man
    • Proceedings of the KSPS conference
    • /
    • 2000.07a
    • /
    • pp.241-254
    • /
    • 2000
  • The past decade has witnessed the abundant use of computer in testing language skills such as listening and reading. Compared with these language skills, we have experienced little use of computer in testing a speaking skill including pronunciation. This is largely due to limitations of the current computer technology. One of such limitations for testing pronunciation is to store and automatically evaluate what the learner utters. Due to this limitation, the computer simply stores what the learner utters and raters evaluate it afterward on a certain rating continuum. With the advent of voice recognition technology, however, the computer has been able to test pronunciation in a systematic way. This technology enables the computer to identify, visually show, and evaluate the learner's intonation pattern by means of autocorrection. The evaluation is expressed in terms of the degree in which the learner's intonation pattern overlaps with that of the native speaker of the target language. In particular, the degree is numerically displayed on the screen, and this numeral is considered as the score of the learner's utterance under our testing framework.

  • PDF

A Korean Multi-speaker Text-to-Speech System Using d-vector (d-vector를 이용한 한국어 다화자 TTS 시스템)

  • Kim, Kwang Hyeon;Kwon, Chul Hong
    • The Journal of the Convergence on Culture Technology
    • /
    • v.8 no.3
    • /
    • pp.469-475
    • /
    • 2022
  • To train the model of the deep learning-based single-speaker TTS system, a speech DB of tens of hours and a lot of training time are required. This is an inefficient method in terms of time and cost to train multi-speaker or personalized TTS models. The voice cloning method uses a speaker encoder model to make the TTS model of a new speaker. Through the trained speaker encoder model, a speaker embedding vector representing the timbre of the new speaker is created from the small speech data of the new speaker that is not used for training. In this paper, we propose a multi-speaker TTS system to which voice cloning is applied. The proposed TTS system consists of a speaker encoder, synthesizer and vocoder. The speaker encoder applies the d-vector technique used in the speaker recognition field. The timbre of the new speaker is expressed by adding the d-vector derived from the trained speaker encoder as an input to the synthesizer. It can be seen that the performance of the proposed TTS system is excellent from the experimental results derived by the MOS and timbre similarity listening tests.

A Study on the Intelligent Man-Machine Interface System: The Experiments of the Recognition of Korean Monotongs and Cognitive Phenomena of Korean Speech Recognition Using Artificial Neural Net Models (통합 사용자 인터페이스에 관한 연구 : 인공 신경망 모델을 이용한 한국어 단모음 인식 및 음성 인지 실험)

  • Lee, Bong-Ku;Kim, In-Bum;Kim, Ki-Seok;Hwang, Hee-Yeung
    • Annual Conference on Human and Language Technology
    • /
    • 1989.10a
    • /
    • pp.101-106
    • /
    • 1989
  • 음성 및 문자를 통한 컴퓨터와의 정보 교환을 위한 통합 사용자 인터페이스 (Intelligent Man- Machine interface) 시스템의 일환으로 한국어 단모음의 인식을 위한 시스템을 인공 신경망 모델을 사용하여 구현하였으며 인식시스템의 상위 접속부에 필요한 단어 인식 모듈에 있어서의 인지 실험도 행하였다. 모음인식의 입력으로는 제1, 제2, 제3 포르만트가 사용되었으며 실험대상은 한국어의 [아, 어, 오, 우, 으, 이, 애, 에]의 8 개의 단모음으로 하였다. 사용한 인공 신경망 모델은 Multilayer Perceptron 이며, 학습 규칙은 Generalized Delta Rule 이다. 1 인의 남성 화자에 대하여 약 94%의 인식율을 나타내었다. 그리고 음성 인식시의 인지 현상 실험을 위하여 약 20개의 단어를 인공신경망의 어휘레벨에 저장하여 음성의 왜곡, 인지시의 lexical 영향, categorical percetion등을 실험하였다. 이때의 인공 신경망 모델은 Interactive Activation and Competition Model을 사용하였으며, 음성 입력으로는 가상의 음성 피쳐 데이타를 사용하였다.

  • PDF

A Study on subtitle synchronization calibration to enhance hearing-impaired persons' viewing convenience of e-sports contents or game streamer contents (청각장애인의 이스포츠 중계방송 및 게임 스트리머 콘텐츠 시청 편의성 증대를 위한 자막 동기화 보정 연구)

  • Shin, Dong-Hwan;Kim, Jeong-Soo;Kim, Chang-Won
    • Journal of Korea Game Society
    • /
    • v.19 no.1
    • /
    • pp.73-84
    • /
    • 2019
  • This study is intended to suggest ways to improve the quality of the service of subtitles provided for the convenience of viewing for deaf people on e-sports broadcast content and game streamer content. Generally, subtitling files of broadcast content are manually written on air by stenographers, so a delay of 3 to 5 seconds is inevitable compared to the original content. Therefore, the present study proposed the formation of an automatic synchronization calibration system using speech recognition technology. In addition, a content application experiment using this system was conducted, and the final result confirmed that the time of synchronization error of subtitling data could be reduced to less than 1 second.

A Study on the Recognition of English Pronunciation based on Artificial Intelligence (인공지능 기반 영어 발음 인식에 관한 연구)

  • Lee, Cheol-Seung;Baek, Hye-Jin
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.16 no.3
    • /
    • pp.519-524
    • /
    • 2021
  • Recently, the fourth industrial revolution has become an area of interest to many countries, mainly in major advanced countries. Artificial intelligence technology, the core technology of the fourth industrial revolution, is developing in a form of convergence in various fields and has a lot of influence on the edutech field to change education innovatively. This paper builds an experimental environment using the DTW speech recognition algorithm and deep learning on various native and non-native data. Furthermore, through comparisons with CNN algorithms, we study non-native speakers to correct them with similar pronunciation to native speakers by measuring the similarity of English pronunciation.

Optimization of Memristor Devices for Reservoir Computing (축적 컴퓨팅을 위한 멤리스터 소자의 최적화)

  • Kyeongwoo Park;HyeonJin Sim;HoBin Oh;Jonghwan Lee
    • Journal of the Semiconductor & Display Technology
    • /
    • v.23 no.1
    • /
    • pp.1-6
    • /
    • 2024
  • Recently, artificial neural networks have been playing a crucial role and advancing across various fields. Artificial neural networks are typically categorized into feedforward neural networks and recurrent neural networks. However, feedforward neural networks are primarily used for processing static spatial patterns such as image recognition and object detection. They are not suitable for handling temporal signals. Recurrent neural networks, on the other hand, face the challenges of complex training procedures and requiring significant computational power. In this paper, we propose memristors suitable for an advanced form of recurrent neural networks called reservoir computing systems, utilizing a mask processor. Using the characteristic equations of Ti/TiOx/TaOy/Pt, Pt/TiOx/Pt, and Ag/ZnO-NW/Pt memristors, we generated current-voltage curves to verify their memristive behavior through the confirmation of hysteresis. Subsequently, we trained and inferred reservoir computing systems using these memristors with the NIST TI-46 database. Among these systems, the accuracy of the reservoir computing system based on Ti/TiOx/TaOy/Pt memristors reached 99%, confirming the Ti/TiOx/TaOy/Pt memristor structure's suitability for inferring speech recognition tasks.

  • PDF