• Title/Summary/Keyword: Voice Recognition Technology

Search Result 212, Processing Time 0.028 seconds

A Study on a Non-Voice Section Detection Model among Speech Signals using CNN Algorithm (CNN(Convolutional Neural Network) 알고리즘을 활용한 음성신호 중 비음성 구간 탐지 모델 연구)

  • Lee, Hoo-Young
    • Journal of Convergence for Information Technology
    • /
    • v.11 no.6
    • /
    • pp.33-39
    • /
    • 2021
  • Speech recognition technology is being combined with deep learning and is developing at a rapid pace. In particular, voice recognition services are connected to various devices such as artificial intelligence speakers, vehicle voice recognition, and smartphones, and voice recognition technology is being used in various places, not in specific areas of the industry. In this situation, research to meet high expectations for the technology is also being actively conducted. Among them, in the field of natural language processing (NLP), there is a need for research in the field of removing ambient noise or unnecessary voice signals that have a great influence on the speech recognition recognition rate. Many domestic and foreign companies are already using the latest AI technology for such research. Among them, research using a convolutional neural network algorithm (CNN) is being actively conducted. The purpose of this study is to determine the non-voice section from the user's speech section through the convolutional neural network. It collects the voice files (wav) of 5 speakers to generate learning data, and utilizes the convolutional neural network to determine the speech section and the non-voice section. A classification model for discriminating speech sections was created. Afterwards, an experiment was conducted to detect the non-speech section through the generated model, and as a result, an accuracy of 94% was obtained.

Study on the Situational satisfaction survey of Smart Phone based on voice recognition technology (스마트 폰 음성 인식 서비스의 상황별 만족도 조사)

  • Lee, Yoon-jeong;Kim, Seung-In
    • Journal of Digital Convergence
    • /
    • v.15 no.8
    • /
    • pp.351-357
    • /
    • 2017
  • The purpose of this study is to analyze the relationship between users' expectation and satisfaction through analyzing smart phone voice recognition service and surveying the situational satisfaction of voice recognition service. First, the concept and current status of speech recognition service were investigated through literature research, and the questionnaire survey was carried out through questionnaires based on the second and third principles. As a result, the user is most likely to use the smartphone voice recognition service when making a telephone call. Also, it was found that the service is the most used at home and the service is used most when the hand is not available. Through these results, various methods such as personalization service, condition recognition function, automatic emergency recognition, unlocking by voice could be derived. Based on this research, it is expected that it will be used effectively for improving smartphone voice recognition service and developing wearable device in the future.

A Study on Real-Time Walking Action Control of Biped Robot with Twenty Six Joints Based on Voice Command (음성명령기반 26관절 보행로봇 실시간 작업동작제어에 관한 연구)

  • Jo, Sang Young;Kim, Min Sung;Yang, Jun Suk;Koo, Young Mok;Jung, Yang Geun;Han, Sung Hyun
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.22 no.4
    • /
    • pp.293-300
    • /
    • 2016
  • The Voice recognition is one of convenient methods to communicate between human and robots. This study proposes a speech recognition method using speech recognizers based on Hidden Markov Model (HMM) with a combination of techniques to enhance a biped robot control. In the past, Artificial Neural Networks (ANN) and Dynamic Time Wrapping (DTW) were used, however, currently they are less commonly applied to speech recognition systems. This Research confirms that the HMM, an accepted high-performance technique, can be successfully employed to model speech signals. High recognition accuracy can be obtained by using HMMs. Apart from speech modeling techniques, multiple feature extraction methods have been studied to find speech stresses caused by emotions and the environment to improve speech recognition rates. The procedure consisted of 2 parts: one is recognizing robot commands using multiple HMM recognizers, and the other is sending recognized commands to control a robot. In this paper, a practical voice recognition system which can recognize a lot of task commands is proposed. The proposed system consists of a general purpose microprocessor and a useful voice recognition processor which can recognize a limited number of voice patterns. By simulation and experiment, it was illustrated the reliability of voice recognition rates for application of the manufacturing process.

The Development of Data Capturing Modules by Speech-Voice Recognition (음성인식에 의한 측량자료취득 모듈개발)

  • 조규전;이영진;차득기
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.18 no.3
    • /
    • pp.279-285
    • /
    • 2000
  • Men's desire for the human interface, due to the development of voice processing technology of computer, and the development of intelligent MMI (Man-Machine Interface) computer technology enabled us to operate computers with our voice without using keyboards or other input systems. Especially, by obtaining field data and layout from the complicated surveying environment and applying the voice recognition technology to the actual surveying work, we can save a lot of working hours and costs. According to the result of this study, the real time Geo-Coding and graphic data-coding were possible with only 25 words by connecting the software engine which recognizes 50,000 different words and the voice recognition technology based on the super IC which recognizes 60 different words with the Total-station and the RTK-GPS.

  • PDF

Variation of the Verification Error Rate of Automatic Speaker Recognition System With Voice Conditions (다양한 음성을 이용한 자동화자식별 시스템 성능 확인에 관한 연구)

  • Hong Soo Ki
    • MALSORI
    • /
    • no.43
    • /
    • pp.45-55
    • /
    • 2002
  • High reliability of automatic speaker recognition regardless of voice conditions is necessary for forensic application. Audio recordings in real cases are not consistent in voice conditions, such as duration, time interval of recording, given text or conversational speech, transmission channel, etc. In this study the variation of verification error rate of ASR system with the voice conditions was investigated. As a result in order to decrease both false rejection rate and false acception rate, the various voices should be used for training and the duration of train voices should be longer than the test voices.

  • PDF

A study on Autonomous Travelling Control of Mobile Robot (이동로봇의 자율주행제어에 관한 연구)

  • Lee, Woo-Song;Shim, Hyun-Seok;Ha, Eun-Tae;Kim, Jong-Soo
    • Journal of the Korean Society of Industry Convergence
    • /
    • v.18 no.1
    • /
    • pp.10-17
    • /
    • 2015
  • We describe a research about remote control of mobile robot based on voice command in this paper. Through real-time remote control and wireless network capabilities of an unmanned remote-control experiments and Home Security / exercise with an unmanned robot, remote control and voice recognition and voice transmission are possible to transmit on a PC using a microphone to control a robot to pinpoint of the source. Speech recognition can be controlled robot by using a remote control. In this research, speech recognition speed and direction of self-driving robot were controlled by a wireless remote control in order to verify the performance of mobile robot with two drives.

Implementation of Extracting Specific Information by Sniffing Voice Packet in VoIP

  • Lee, Dong-Geon;Choi, WoongChul
    • International journal of advanced smart convergence
    • /
    • v.9 no.4
    • /
    • pp.209-214
    • /
    • 2020
  • VoIP technology has been widely used for exchanging voice or image data through IP networks. VoIP technology, often called Internet Telephony, sends and receives voice data over the RTP protocol during the session. However, there is an exposition risk in the voice data in VoIP using the RTP protocol, where the RTP protocol does not have a specification for encryption of the original data. We implement programs that can extract meaningful information from the user's dialogue. The meaningful information means the information that the program user wants to obtain. In order to do that, our implementation has two parts. One is the client part, which inputs the keyword of the information that the user wants to obtain, and the other is the server part, which sniffs and performs the speech recognition process. We use the Google Speech API from Google Cloud, which uses machine learning in the speech recognition process. Finally, we discuss the usability and the limitations of the implementation with the example.

Characteristics of Cow´s Voices in Time and Frequency domains for Recognition

  • Ikeda, Yoshio;Ishii, Y.
    • Agricultural and Biosystems Engineering
    • /
    • v.2 no.1
    • /
    • pp.15-23
    • /
    • 2001
  • On the assumption that the voices of the cows are produced by the linear prediction filter, we characterized the cows’voices. The order of this filter was determined by examining the voice characteristics both in time and frequency domains. The proposed order of the linear prediction filter is 15 for modeling voice production of the cow. The characteristics of the amplitude envelope of the voice signal was investigated by analyzing the sequence of the short time variance both in time and frequency domains, and the new parameters were defined. One of the coefficients o the linear prediction filter generating the voice signal, the fundamental frequency, the slope of the straight line regressed from the log-log spectra of the short time variance and the coefficients of the linear prediction filter generating the sequence of the short time variance of the voice signal can differentiate the two cows.

  • PDF

An ASIC implementation of a Dual Channel Acoustic Beamforming for MEMS microphone in 0.18㎛ CMOS technology (0.18㎛ CMOS 공정을 이용한 MEMS 마이크로폰용 이중 채널 음성 빔포밍 ASIC 설계)

  • Jang, Young-Jong;Lee, Jea-Hack;Kim, Dong-Sun;Hwang, Tae-ho
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.13 no.5
    • /
    • pp.949-958
    • /
    • 2018
  • A voice recognition control system is a system for controlling a peripheral device by recognizing a voice. Recently, a voice recognition control system have been applied not only to smart devices but also to various environments ranging from IoT(: Internet of Things), robots, and vehicles. In such a voice recognition control system, the recognition rate is lowered due to the ambient noise in addition to the voice of the user. In this paper, we propose a dual channel acoustic beamforming hardware architecture for MEMS(: Microelectromechanical Systems) microphones to eliminate ambient noise in addition to user's voice. And the proposed hardware architecture is designed as ASIC(: Application-Specific Integrated Circuit) using TowerJazz $0.18{\mu}m$ CMOS(: Complementary Metal-Oxide Semiconductor) technology. The designed dual channel acoustic beamforming ASIC has a die size of $48mm^2$, and the directivity index of the user's voice were measured to be 4.233㏈.

Adaptive Post Processing of Nonlinear Amplified Sound Signal

  • Lee, Jae-Kyu;Choi, Jong-Suk;Seok, Cheong-Gyu;Kim, Mun-Sang
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2005.06a
    • /
    • pp.872-876
    • /
    • 2005
  • We propose a real-time post processing of nonlinear amplified signal to improve voice recognition in remote talk. In the previous research, we have found the nonlinear amplification has unique advantage for both the voice activity detection and the sound localization in remote talk. However, the original signal becomes distorted due to its nonlinear amplification and, as a result, the rest of sequence such as speech recognition show less satisfactorily results. To remedy this problem, we implement a linearization algorithm to recover the voice signal's linear characteristics after the localization has been done.

  • PDF