• Title/Summary/Keyword: Speech Recognition Technology

Search Result 527, Processing Time 0.029 seconds

A review of speech perception: The first step for convergence on speech engineering (말소리지각에 대한 종설: 음성공학과의 융복합을 위한 첫 단계)

  • Lee, Young-lim
    • Journal of Digital Convergence
    • /
    • v.15 no.12
    • /
    • pp.509-516
    • /
    • 2017
  • People observe a lot of events in our environment and we do not have any difficulty to perceive events including speech perception. Like perception of biological motion, two main theorists have debated on speech perception. The purpose of this review article is to briefly describe speech perception and compare these two theories of speech perception. Motor theorists claim that speech perception is special to human because we both produce and perceive articulatory events that are processed by innate neuromotor commands. However, direct perception theorists claim that speech perception is not different from nonspeech perception because we only need to detect information directly like all other kinds of event. It is important to grasp the fundamental idea of how human perceive articulatory events for the convergence on speech engineering. Thus, this basic review of speech perception is expected to be able to used for AI, voice recognition technology, speech recognition system, etc.

Reliable Sound Source Localization for Human Robot Interaction

  • Kim, Hyun-Don;Choi, Jong-Suk;Lee, Chang-Hoon;Kim, Mun-Sang
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2004.08a
    • /
    • pp.1820-1825
    • /
    • 2004
  • In this paper, we propose a humanoid active audition system which detects the direction of sound and performs speech recognition using just three microphones. Compared with previous researches, this system comprises simpler algorithm and better amplifier system having advantages to increase a detectible distance of sound signal in spite of simple circuit. In order to verify our system's performance, we install the proposed active audition system to the home service robot, called Hombot II, which has been developed at the KIST (Korea Institute of Science and Technology), thus we confirm excellent performance by experimental results

  • PDF

A Study on Smart Tourism Based on Face Recognition Using Smartphone

  • Ryu, Ki-Hwan;Lee, Myoung-Su
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.8 no.4
    • /
    • pp.39-47
    • /
    • 2016
  • This study is a smart tourism research based on face recognition applied system that manages individual information of foreign tourists to smartphone. It is a way to authenticate by using face recognition, which is biometric information, as a technology applied to identification inquiry, immigration control, etc. and it is designed so that tourism companies can provide customized service to customers by applying algorism to smartphone. The smart tourism system based on face recognition is a system that prepares the reception service by sending the information to smartphone of tourist service company guide in real time after taking faces of foreign tourists who enter Korea for the first time with glasses attached to the camera. The smart tourism based on face recognition is personal information recognition technology, speech recognition technology, sensing technology, artificial intelligence personal information recognition technology, etc. Especially, artificial intelligence personal information recognition technology is a system that enables the tourism service company to implement the self-promotion function to commemorate the visit of foreign tourists and that enables tourists to participate in events and experience them directly. Since the application of smart tourism based on face recognition can utilize unique facial data and image features, it can be beneficially utilized for service companies that require accurate user authentication and service companies that prioritize security. However, in terms of sharing information by government organizations and private companies, preemptive measures such as the introduction of security systems should be taken.

DNN based Robust Speech Feature Extraction and Signal Noise Removal Method Using Improved Average Prediction LMS Filter for Speech Recognition (음성 인식을 위한 개선된 평균 예측 LMS 필터를 이용한 DNN 기반의 강인한 음성 특징 추출 및 신호 잡음 제거 기법)

  • Oh, SangYeob
    • Journal of Convergence for Information Technology
    • /
    • v.11 no.6
    • /
    • pp.1-6
    • /
    • 2021
  • In the field of speech recognition, as the DNN is applied, the use of speech recognition is increasing, but the amount of calculation for parallel training needs to be larger than that of the conventional GMM, and if the amount of data is small, overfitting occurs. To solve this problem, we propose an efficient method for robust voice feature extraction and voice signal noise removal even when the amount of data is small. Speech feature extraction efficiently extracts speech energy by applying the difference in frame energy for speech and the zero-crossing ratio and level-crossing ratio that are affected by the speech signal. In addition, in order to remove noise, the noise of the speech signal is removed by removing the noise of the speech signal with an average predictive improved LMS filter with little loss of speech information while maintaining the intrinsic characteristics of speech in detection of the speech signal. The improved LMS filter uses a method of processing noise on the input speech signal by adjusting the active parameter threshold for the input signal. As a result of comparing the method proposed in this paper with the conventional frame energy method, it was confirmed that the error rate at the start point of speech is 7% and the error rate at the end point is improved by 11%.

Optimized Wiener Filter for Noise Reduction in VoIP Environments (VoIP 환경에서의 잡음제거를 위한 최적화된 위너 필터)

  • Jeong, Sang-Bae;Lee, Sung-Doke;Hahn, Min-Soo
    • MALSORI
    • /
    • no.64
    • /
    • pp.105-119
    • /
    • 2007
  • Noise reduction technologies are indispensable to achieve acceptable speech quality in VoIP systems. This paper proposes a Wiener filter optimized to the estimated SNR of noisy speech for the noise reduction in VoIP environments. The proposed noise canceller is applied as a pre-processor before speech encoding. The performance of the proposed method is evaluated by the PESQ in various noisy conditions. In this paper, the proposed algorithm is applied to G.711, G.723.1, and G.729A which are all VoIP speech codecs. The PESQ results show that the performance of our proposed noise reduction scheme outperforms those of the noise suppression in the IS-127 EVRC and the ETSI standard for the advanced distributed speech recognition front-end.

  • PDF

Implementation of a Speaker-independent Speech Recognizer Using the TMS320F28335 DSP (TMS320F28335 DSP를 이용한 화자독립 음성인식기 구현)

  • Chung, Ik-Joo
    • Journal of Industrial Technology
    • /
    • v.29 no.A
    • /
    • pp.95-100
    • /
    • 2009
  • In this paper, we implemented a speaker-independent speech recognizer using the TMS320F28335 DSP which is optimized for control applications. For this implementation, we used a small-sized commercial DSP module and developed a peripheral board including a codec, signal conditioning circuits and I/O interfaces. The speech signal digitized by the TLV320AIC23 codec is analyzed based on MFCC feature extraction methed and recognized using the continuous-density HMM. Thanks to the internal SRAM and flash memory on the TMS320F28335 DSP, we did not need any external memory devices. The internal flash memory contains ADPCM data for voice response as well as HMM data. Since the TMS320F28335 DSP is optimized for control applications, the recognizer may play a good role in the voice-activated control areas in aspect that it can integrate speech recognition capability and inherent control functions into the single DSP.

  • PDF

Distant-talking of Speech Interface for Humanoid Robots (휴머노이드 로봇을 위한 원거리 음성 인터페이스 기술 연구)

  • Lee, Hyub-Woo;Yook, Dong-Suk
    • Proceedings of the KSPS conference
    • /
    • 2007.05a
    • /
    • pp.39-40
    • /
    • 2007
  • For efficient interaction between human and robots, speech interface is a core problem especially in noisy and reverberant conditions. This paper analyzes main issues of spoken language interface for humanoid robots, such as sound source localization, voice activity detection, and speaker recognition.

  • PDF

GMM based Speaker Identification using Pitch Information (피치 정보를 이용한 GMM 기반의 화자 식별)

  • Park Taesun;Hahn Minsoo
    • MALSORI
    • /
    • no.47
    • /
    • pp.121-129
    • /
    • 2003
  • This paper describes the use of pitch information for speaker identification. The recognition system is a GMM based one with 4 connected Korean digits speech database. The mean of the pitch period in voiced sections of speech are shown to be ,useful at discriminating between speakers. Utilizing this feature with Gaussian mixture model in the speaker identification system gave a marked improvement, maximum 6% improvement comparing to the baseline Gaussian mixture model.

  • PDF

Keyword Retrieval-Based Korean Text Command System Using Morphological Analyzer (형태소 분석기를 이용한 키워드 검색 기반 한국어 텍스트 명령 시스템)

  • Park, Dae-Geun;Lee, Wan-Bok
    • Journal of the Korea Convergence Society
    • /
    • v.10 no.2
    • /
    • pp.159-165
    • /
    • 2019
  • Based on deep learning technology, speech recognition method has began to be applied to commercial products, but it is still difficult to be used in the area of VR contents, since there is no easy and efficient way to process the recognized text after the speech recognition module. In this paper, we propose a Korean Language Command System, which can efficiently recognize and respond to Korean speech commands. The system consists of two components. One is a morphological analyzer to analyze sentence morphemes and the other is a retrieval based model which is usually used to develop a chatbot system. Experimental results shows that the proposed system requires only 16% commands to achieve the same level of performance when compared with the conventional string comparison method. Furthermore, when working with Google Cloud Speech module, it revealed 60.1% of success rate. Experimental results show that the proposed system is more efficient than the conventional string comparison method.

An Automatic Data Construction Approach for Korean Speech Command Recognition

  • Lim, Yeonsoo;Seo, Deokjin;Park, Jeong-sik;Jung, Yuchul
    • Journal of the Korea Society of Computer and Information
    • /
    • v.24 no.12
    • /
    • pp.17-24
    • /
    • 2019
  • The biggest problem in the AI field, which has become a hot topic in recent years, is how to deal with the lack of training data. Since manual data construction takes a lot of time and efforts, it is non-trivial for an individual to easily build the necessary data. On the other hand, automatic data construction needs to handle data quality issue. In this paper, we introduce a method to automatically extract the data required to develop Korean speech command recognizer from the web and to automatically select the data that can be used for training data. In particular, we propose a modified ResNet model that shows modest performance for the automatically constructed Korean speech command data. We conducted an experiment to show the applicability of the command set of the health and daily life domain. In a series of experiments using only automatically constructed data, the accuracy of the health domain was 89.5% in ResNet15 and 82% in ResNet8 in the daily lives domain, respectively.