Search | Korea Science

The Status of Speech Recognition Technology and its Prospects on Practical Applications (음성인식기술의 현황과 실용화 전망)

구명완
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1998.06c
- /
- pp.17-22
- /
- 1998
본고에서는 음성인식기술의 최근동향을 알아보고 국외의 실용화사례를 통신사업자와 비통신사업자 주축으로 이루어지고 있는 응용사례를 소개한다. 현재의 음성인식 기술중 최근 주목을 받고 있는 발화 확인기술과 탐색기술을 소개하고 외국의 실용화 사례를 통신분야와 비통신 분야로 나누어서 기술한다. 그리고 실용화 전망에 대해 고찰한 후 결론을 맺는다.
PDF

A Speech Coder for Server-Based Speech Recognition in Mobile Communication (이동통신 환경 하에서의 서버 기반 음성 인식을 위한 음성 부호화 기법)

Lee Gil Ho;Yoon Jae Sam;Oh Yoo Rhee;Kim Hong Kook
- Proceedings of the Acoustical Society of Korea Conference
- /
- autumn
- /
- pp.89-92
- /
- 2004
본 논문의 목적은 이동통신 환경 하에서 음성 인식과 음성 부호화를 성능의 저하 없이 동시에 수행하기 위한 기법을 개발하는 것에 있다. 이를 구현하기 위해 통신상에서 전송되는 음성 특징 파라미터는 기존 음성 부호화기의 LPC 대신 음성 인식 파라미터인 MFCC를 사용하였다. 따라서 음성 인식 성능은 향상된다 하지만 음성 재생을 위해 MFCC를 LPC로 변환하는 과정에서 오차가 발생하여 전송되는 bit 수에 비해 만족할만한 음질을 얻을 수 없다. 따라서 이 오차를 보상하여야 하며 이를 위한 변수를 추가하여 음질을 개선시켰다. 그 결과 음질과 음성 인식에서 안정된 성능을 보이는 음성 부호화기를 개발하였다.
PDF

DTW based Utterance Rejection on Broadcasting News Keyword Spotting System (방송뉴스 핵심어 검출 시스템에서의 오인식 거부를 위한 DTW의 적용)

Park, Kyung-Mi;Park, Jeong-Sik;Oh, Yung-Hwan
- Proceedings of the KSPS conference
- /
- 2005.11a
- /
- pp.155-158
- /
- 2005
Keyword spotting is effective to find keyword from the continuously pronounced speech. However, non-keyword may be accepted as keyword when the environmental noise occurs or speaker changes. To overcome this performance degradation, utterance rejection techniques using confidence measure on the recognition result have been developed. In this paper, we apply DTW to the HMM based broadcasting news keyword spotting system for rejecting non-keyword. Experimental result shows that false acceptance rate is decreased to 50%.
PDF

A Multi-speaker Speech Synthesis System Using X-vector (x-vector를 이용한 다화자 음성합성 시스템)

Jo, Min Su;Kwon, Chul Hong
- The Journal of the Convergence on Culture Technology
- /
- v.7 no.4
- /
- pp.675-681
- /
- 2021
With the recent growth of the AI speaker market, the demand for speech synthesis technology that enables natural conversation with users is increasing. Therefore, there is a need for a multi-speaker speech synthesis system that can generate voices of various tones. In order to synthesize natural speech, it is required to train with a large-capacity. high-quality speech DB. However, it is very difficult in terms of recording time and cost to collect a high-quality, large-capacity speech database uttered by many speakers. Therefore, it is necessary to train the speech synthesis system using the speech DB of a very large number of speakers with a small amount of training data for each speaker, and a technique for naturally expressing the tone and rhyme of multiple speakers is required. In this paper, we propose a technology for constructing a speaker encoder by applying the deep learning-based x-vector technique used in speaker recognition technology, and synthesizing a new speaker's tone with a small amount of data through the speaker encoder. In the multi-speaker speech synthesis system, the module for synthesizing mel-spectrogram from input text is composed of Tacotron2, and the vocoder generating synthesized speech consists of WaveNet with mixture of logistic distributions applied. The x-vector extracted from the trained speaker embedding neural networks is added to Tacotron2 as an input to express the desired speaker's tone.
https://doi.org/10.17703/JCCT.2021.7.4.675 인용 PDF KSCI

A Study on Intelligent Control Algorithm Development for Cooperation Working of Human and Robot (인간과 로봇 협력작업을 위한 로봇 지능제어알고리즘 개발에 관한 연구)

Lee, Woo-Song;Jung, Yang-Guen;Park, In-Man;Jung, Jong-Gyu;Kim, Hui-Jin;Kim, Min-Seong;Han, Sung-Hyun
- Journal of the Korean Society of Industry Convergence
- /
- v.20 no.4
- /
- pp.285-297
- /
- 2017
This study proposed a new approach to develop an Intelligent control algorithm for cooperative working of human and robot based on voice recognition. In general case of speaker verification, Gaussian Mixture Model is used to model the feature vectors of reference speech signals. On the other hand, Dynamic Time Warping based template matching techniques were presented for the voice recognition about several years ago. We converge these two different concepts in a single method and then implement in a real time voice recognition enough to make reference model to satisfy 95% of recognition performance. In this paper it was illustrated the reliability of voice recognition by simulation and experiments for humanoid robot with 18 joints.
https://doi.org/10.21289/KSIC.2017.20.4.285 인용 PDF KSCI

Isolated Word Recognition with the E-MIND II Neurocomputer (E-MIND II를 이용한 고립 단어 인식 시스템의 설계)

Kim, Joon-Woo;Jeong, Hong;Kim, Myeong-Won
- Journal of the Korean Institute of Telematics and Electronics B
- /
- v.32B no.11
- /
- pp.1527-1535
- /
- 1995
This paper introduces an isolated word recognition system realized on a neurocomputer called E-MIND II, which is a 2-D torus wavefront array processor consisting of 256 DNP IIs. The DNP II is an all digital VLSI unit processor for the EMIND II featuring the emulation capability of more than thousands of neurons, the 40 MHz clock speed, and the on-chip learning. Built by these PEs in 2-D toroidal mesh architecture, the E- MIND II can be accelerated over 2 Gcps computation speed. In this light, the advantages of the E-MIND II in its capability of computing speed, scalability, computer interface, and learning are especially suitable for real time application such as speech recognition. We show how to map a TDNN structure on this array and how to code the learning and recognition algorithms for a user independent isolated word recognition. Through hardware simulation, we show that recognition rate of this system is about 97% for 30 command words for a robot control.
PDF

Real time instruction classification system

Sang-Hoon Lee;Dong-Jin Kwon
- International Journal of Internet, Broadcasting and Communication
- /
- v.16 no.3
- /
- pp.212-220
- /
- 2024
A recently the advancement of society, AI technology has made significant strides, especially in the fields of computer vision and voice recognition. This study introduces a system that leverages these technologies to recognize users through a camera and relay commands within a vehicle based on voice commands. The system uses the YOLO (You Only Look Once) machine learning algorithm, widely used for object and entity recognition, to identify specific users. For voice command recognition, a machine learning model based on spectrogram voice analysis is employed to identify specific commands. This design aims to enhance security and convenience by preventing unauthorized access to vehicles and IoT devices by anyone other than registered users. We converts camera input data into YOLO system inputs to determine if it is a person, Additionally, it collects voice data through a microphone embedded in the device or computer, converting it into time-domain spectrogram data to be used as input for the voice recognition machine learning system. The input camera image data and voice data undergo inference tasks through pre-trained models, enabling the recognition of simple commands within a limited space based on the inference results. This study demonstrates the feasibility of constructing a device management system within a confined space that enhances security and user convenience through a simple real-time system model. Finally our work aims to provide practical solutions in various application fields, such as smart homes and autonomous vehicles.
https://doi.org/10.7236/IJIBC.2024.16.3.212 인용 PDF

Impostor Detection in Speaker Recognition Using Confusion-Based Confidence Measures

Kim, Kyu-Hong;Kim, Hoi-Rin;Hahn, Min-Soo
- ETRI Journal
- /
- v.28 no.6
- /
- pp.811-814
- /
- 2006
In this letter, we introduce confusion-based confidence measures for detecting an impostor in speaker recognition, which does not require an alternative hypothesis. Most traditional speaker verification methods are based on a hypothesis test, and their performance depends on the robustness of an alternative hypothesis. Compared with the conventional Gaussian mixture model-universal background model (GMM-UBM) scheme, our confusion-based measures show better performance in noise-corrupted speech. The additional computational requirements for our methods are negligible when used to detect or reject impostors.
PDF

A Study on Speech Recognition using GAVQ(Genetic Algorithms Vector Quantization) (GAVQ를 이용한 음성인식에 관한 연구)

Lee, Sang-Hee;Lee, Jae-Kon;Jeong, Ho-Kyoun;Kim, Yong-Yun;Nam, Jae-Sung
- Journal of Industrial Technology
- /
- v.19
- /
- pp.209-216
- /
- 1999
In this paper, we proposed a modofied genetic algorithm to minimize misclassification rate for determining the codebook. Genetic algorithms are adaptive methods which may be used solve search and optimization problems based on the genetic processes of biological organisms. But they generally require a large amount of computation efforts. GAVQ can choose the optimal individuals by genetic operators. The position of individuals are optimized to improve the recognition rate. The technical properties of this study is that prevents us from the local minimum problem, which is not avoidable by conventional VQ algorithms. We compared the simulation result with Matlab using phoneme data. The simulation results show that the recognition rate from GAVQ is improved by comparing the conventional VQ algorithms.
PDF

Communication Aid System For Dementia Patients (치매환자를 위한 대화 보조 시스템)

Sung-Ill Kim;Byoung-Chul Kim
- Journal of Biomedical Engineering Research
- /
- v.23 no.6
- /
- pp.459-465
- /
- 2002
The goat of the present research is to improve the quality of life of both the elderly patients with dementia and their caregivers. For this Purpose, we developed a communication aid system that is consisted of three modules such as speech recognition engine, graphical agent. and database classified by a nursing schedule. The system was evaluated in an actual environment of nursing facility by introducing the system to an older mail patient with dementia. The comparison study was then carried out with and without system, respectively. The occupational therapists then evaluated subject"s reaction to the system by photographing his behaviors. The evaluation results revealed that the proposed system was more responsive in catering to needs of subject than professional caregivers. Moreover we could see that the frequency of causing the utterances of subject increased by introducing the system.
PDF KSCI

Search Result 527, Processing Time 0.024 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)