• Title/Summary/Keyword: Speech Detection

Search Result 472, Processing Time 0.02 seconds

Reduction Algorithm of Environmental Noise by Multi-band Filter (멀티밴드필터에 의한 환경잡음억압 알고리즘)

  • Choi, Jae-Seung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.17 no.8
    • /
    • pp.91-97
    • /
    • 2012
  • This paper first proposes the speech recognition algorithm by detection of the speech and noise sections at each frame, then proposes the reduction algorithm of environmental noise by multi-band filter which removes the background noises at each frame according to detection of the speech and noise sections. The proposed algorithm reduces the background noises using filter bank sub-band domain after extracting the features from the speech data. In this experiment, experimental results of the proposed noise reduction algorithm by the multi-band filter demonstrate using the speech and noise data, at each frame. Based on measuring the spectral distortion, experiments confirm that the proposed algorithm is effective for the speech by corrupted the noise.

Detection and Recognition Method for Emergency and Non-emergency Speech by Gaussian Mixture Model (GMM을 이용한 응급 단어와 비응급 단어의 검출 및 인식 기법)

  • Cho, Young-Im;Lee, Dae-Jong
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.21 no.2
    • /
    • pp.254-259
    • /
    • 2011
  • For the emergency detecting in general CCTV environment of our daily life, the monitoring by only images through CCTV information occurs some problems especially in cost as well as man power. Therefore, in this paper, for detecting emergency state dynamically through CCTV as well as resolving some problems, we propose a detection and recognition method for emergency and non-emergency speech by GMM. The proposed method determine whether input speech is emergency or non-emergency speech by global GMM. If emergeny speech, local GMM is performed to classify the type of emergency speech. The proposed method is tested and verified by emergency and non-emergency speeches in various environmental conditions.

Implementation of speech interface for windows 95 (Windows95 환경에서의 음성 인터페이스 구현)

  • 한영원;배건성
    • Journal of the Korean Institute of Telematics and Electronics S
    • /
    • v.34S no.5
    • /
    • pp.86-93
    • /
    • 1997
  • With recent development of speech recognition technology and multimedia computer systems, more potential applications of voice will become a reality. In this paper, we implement speech interface on the windows95 environment for practical use fo multimedia computers with voice. Speech interface is made up of three modules, that is, speech input and detection module, speech recognition module, and application module. The speech input and etection module handles th elow-level audio service of win32 API to input speech data on real time. The recognition module processes the incoming speech data, and then recognizes the spoken command. DTW pattern matching method is used for speech recognition. The application module executes the voice command properly on PC. Each module of the speech interface is designed and examined on windows95 environments. Implemented speech interface and experimental results are explained and discussed.

  • PDF

Common Speech Database Collection and Validation for Communications (한국어 공통 음성 DB구축 및 오류 검증)

  • Lee Soo-jong;Kim Sanghun;Lee Youngjik
    • MALSORI
    • /
    • no.46
    • /
    • pp.145-157
    • /
    • 2003
  • In this paper, we'd like to briefly introduce Korean common speech database, which project has been started to construct a large scaled speech database since 2002. The project aims at supporting the R&D environment of the speech technology for industries. It encourages domestic speech industries and activates speech technology domestic market. In the first year, the resulting common speech database consists of 25 kinds of databases considering various recording conditions such as telephone, PC, VoIP etc. The speech database will be widely used for speech recognition, speech synthesis, and speaker identification. On the other hand, although the database was originally corrected by manual, still it retains unknown errors and human errors. So, in order to minimize the errors in the database, we tried to find the errors based on the recognition errors and classify several kinds of errors. To be more effective than typical recognition technique, we will develop the automatic error detection method. In the future, we will try to construct new databases reflecting the needs of companies and universities.

  • PDF

Detection of Laryngeal Pathology in Speech Using Multilayer Perceptron Neural Networks (다층 퍼셉트론 신경회로망을 이용한 후두 질환 음성 식별)

  • Kang Hyun Min;Kim Yoo Shin;Kim Hyung Soon
    • Proceedings of the KSPS conference
    • /
    • 2002.11a
    • /
    • pp.115-118
    • /
    • 2002
  • Neural networks have been known to have great discriminative power in pattern classification problems. In this paper, the multilayer perceptron neural networks are employed to automatically detect laryngeal pathology in speech. Also new feature parameters are introduced which can reflect the periodicity of speech and its perturbation. These parameters and cepstral coefficients are used as input of the multilayer perceptron neural networks. According to the experiment using Korean disordered speech database, incorporation of new parameters with cepstral coefficients outperforms the case with only cepstral coefficients.

  • PDF

On a Detection of the ZCR-Parameter for Higher Formants of Speech Signals (음성신호의 상위 포만트에 대한 ZCR-파라미터 검출에 관한 연구)

  • 유건수
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1992.06a
    • /
    • pp.49-53
    • /
    • 1992
  • In many applications such as speech analysis, speech coding, speech recognition, etc., the voiced-unvoiced decision should be performed correctly for efficient processing. One of the parameters which are used for voice-unvoiced decision is zero-crossing. But the information of higher formants have not represented as the zero-crossing rate for higher formants of speech signals.

  • PDF

Scoring Methods for Improvement of Speech Recognizer Detecting Mispronunciation of Foreign Language (외국어 발화오류 검출 음성인식기의 성능 개선을 위한 스코어링 기법)

  • Kang Hyo-Won;Kwon Chul-Hong
    • MALSORI
    • /
    • no.49
    • /
    • pp.95-105
    • /
    • 2004
  • An automatic pronunciation correction system provides learners with correction guidelines for each mispronunciation. For this purpose we develope a speech recognizer which automatically classifies pronunciation errors when Koreans speak a foreign language. In order to develope the methods for automatic assessment of pronunciation quality, we propose a language model based score as a machine score in the speech recognizer. Experimental results show that the language model based score had higher correlation with human scores than that obtained using the conventional log-likelihood based score.

  • PDF

On a Pitch Detection using Low Pass Filter with Variable Bandwidth Preprocessed (전처리된 가변대역폭 LPF에 의한 피치검출법)

  • 한진희
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1995.06a
    • /
    • pp.221-224
    • /
    • 1995
  • In speech signal processing, it is necessary to detect exactly the pitch. The algorithms of pitch extraction with have been proposed until now are difficult to detect pitches over wide range speech signals. In this paper, thus, we proposed a new pitch detection algorithm that used a low pass filter with variable bandwidth. It is the method that preprosses to find the first formant of speech signals by the FFT at each frame and detects the pitches for signals LPFed with the cut off frequency according to the first formant. Applying the method, we obtained the pitch contours, improving the accuracy of pitch detection in some noise environments.

  • PDF

Automatic Detection of Korean Accentual Phrase Boundaries

  • Lee, Ki-Yeong;Song, Min-Suck
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.1E
    • /
    • pp.27-31
    • /
    • 1999
  • Recent linguistic researches have brought into focus the relations between prosodic structures and syntactic, semantic or phonological structures. Most of them prove that prosodic information is available for understanding syntactic, semantic and discourse structures. But this result has not been integrated yet into recent Korean speech recognition or understanding systems. This study, as a part of integrating prosodic information into the speech recognition system, proposes an automatic detection technique of Korean accentual phrase boundaries by using one-stage DP, and the normalized pitch pattern. For making the normalized pitch pattern, this study proposes a method of modified normalization for Korean spoken language. For the experiment, this study employs 192 sentential speech data of 12 men's voice spoken in standard Korean, in which 720 accentual phrases are included, and 74.4% of the accentual phrase boundaries are correctly detected while 14.7% are the false detection rate.

  • PDF

Integrated System of Mobile Manipulator with Speech Recognition and Deep Learning-based Object Detection (음성인식과 딥러닝 기반 객체 인식 기술이 접목된 모바일 매니퓰레이터 통합 시스템)

  • Jang, Dongyeol;Yoo, Seungryeol
    • The Journal of Korea Robotics Society
    • /
    • v.16 no.3
    • /
    • pp.270-275
    • /
    • 2021
  • Most of the initial forms of cooperative robots were intended to repeat simple tasks in a given space. So, they showed no significant difference from industrial robots. However, research for improving worker's productivity and supplementing human's limited working hours is expanding. Also, there have been active attempts to use it as a service robot by applying AI technology. In line with these social changes, we produced a mobile manipulator that can improve the worker's efficiency and completely replace one person. First, we combined cooperative robot with mobile robot. Second, we applied speech recognition technology and deep learning based object detection. Finally, we integrated all the systems by ROS (robot operating system). This system can communicate with workers by voice and drive autonomously and perform the Pick & Place task.