• 제목/요약/키워드: Automatic Speech Detection

검색결과 58건 처리시간 0.031초

Automatic detection of speech sound disorder in children using automatic speech recognition and audio classification

  • Selina S. Sung;Jungmin So;Tae-Jin Yoon;Seunghee Ha
    • 말소리와 음성과학
    • /
    • 제16권3호
    • /
    • pp.87-94
    • /
    • 2024
  • Children with speech sound disorders (SSDs) face various challenges in producing speech sounds, which often lead to significant social and educational barriers. Detecting and treating SSDs in children is complex due to the variability in disorder severity and diagnostic boundaries. This study aims to develop an automated SSD detection system using deep learning models, leveraging their ability to transcribe audio, efficiently capture sound patterns on a vast scale, and address the limitations of traditional methods involving speech-language pathologists. For this study, we collected audio recordings from 573 children aged two to nine using standardized prompts from the Assessment of Phonology and Articulation for Children. Speech-language pathologists analyzed the recordings and identified 92 children with SSDs. To build an automatic SSD detection system, we used a dataset to train neural network models for automatic speech recognition and audio classification. Five different methods are studied, with the best method achieving 73.9% unweighted average recall. While the results show the potential of using deep learning models for the automatic detection of SSDs in children, further research is needed to improve the reliability of the models widely used in practice.

한국인의 외국어 발화오류 검출을 위한 음성인식기의 발음 네트워크 구성 (Pronunciation Network Construction of Speech Recognizer for Mispronunciation Detection of Foreign Language)

  • 이상필;권철홍
    • 대한음성학회지:말소리
    • /
    • 제49호
    • /
    • pp.123-134
    • /
    • 2004
  • An automatic pronunciation correction system provides learners with correction guidelines for each mispronunciation. In this paper we propose an HMM based speech recognizer which automatically classifies pronunciation errors when Koreans speak Japanese. We also propose two pronunciation networks for automatic detection of mispronunciation. In this paper, we evaluated performances of the networks by computing the correlation between the human ratings and the machine scores obtained from the speech recognizer.

  • PDF

음성인식기를 이용한 발음오류 자동분류 결과 분석 (Performance Analysis of Automatic Mispronunciation Detection Using Speech Recognizer)

  • 강효원;이상필;배민영;이재강;권철홍
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2003년도 10월 학술대회지
    • /
    • pp.29-32
    • /
    • 2003
  • This paper proposes an automatic pronunciation correction system which provides users with correction guidelines for each pronunciation error. For this purpose, we develop an HMM speech recognizer which automatically classifies pronunciation errors when Korean speaks foreign language. And, we collect speech database of native and nonnative speakers using phonetically balanced word lists. We perform analysis of mispronunciation types from the experiment of automatic mispronunciation detection using speech recognizer.

  • PDF

Ramp Edge Detection을 이용한 끝점 검출과 음절 분할에 관한 연구 (A Study on Endpoint Detection and Syllable Segmentation System Using Ramp Edge Detection)

  • 유일수;홍광석
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2003년도 하계종합학술대회 논문집 Ⅳ
    • /
    • pp.2216-2219
    • /
    • 2003
  • Accurate speech region detection and automatic syllable segmentation is important part of speech recognition system. In automatic speech recognition system, they are needed for the purpose of accurate recognition and less computational complexity, In this paper, we Propose improved syllable segmentation method using ramp edge detection method and residual signal Peak energy. These methods were used to ensure accuracy and robustness for endpoint detection and syllable segmentation system. They have almost invariant response to various background noise levels. As experimental results, we obtained the rate of 90.7% accuracy in syllable segmentation in a condition of accurate endpoint detection environments.

  • PDF

한국 표준어 연속음성에서의 억양구와 강세구 자동 검출 (Automatic Detection of Intonational and Accentual Phrases in Korean Standard Continuous Speech)

  • 이기영;송민석
    • 음성과학
    • /
    • 제7권2호
    • /
    • pp.209-224
    • /
    • 2000
  • This paper proposes an automatic detection method of intonational and accentual phrases in Korean standard continuous speech. We use the pause over 150 msec for detecting intonational phrases, and extract accentual phrases from the intonational phrases by analyzing syllables and pitch contours. The speech data for the experiment are composed of seven male voices and two female voices which read the texts of the fable 'the ant and the grasshopper' and a newspaper article 'manmulsang' in normal speed and in Korean standard variation. The results of the experiment shows that the detection rate of intonational phrases is 95% on the average and that of accentual phrases is 73%. This detection rate implies that we can segment the continuous speech into smaller units(i.e. prosodic phrases) by using the prosodic information and so the objects of speech recognition can narrow down to words or phrases in continuous speech.

  • PDF

A User-friendly Remote Speech Input Method in Spontaneous Speech Recognition System

  • Suh, Young-Joo;Park, Jun;Lee, Young-Jik
    • The Journal of the Acoustical Society of Korea
    • /
    • 제17권2E호
    • /
    • pp.38-46
    • /
    • 1998
  • In this paper, we propose a remote speech input device, a new method of user-friendly speech input in spontaneous speech recognition system. We focus the user friendliness on hands-free and microphone independence in speech recognition applications. Our method adopts two algorithms, the automatic speech detection and the microphone array delay-and-sum beamforming (DSBF)-based speech enhancement. The automatic speech detection algorithm is composed of two stages; the detection of speech and nonspeech using the pitch information for the detected speech portion candidate. The DSBF algorithm adopts the time domain cross-correlation method as its time delay estimation. In the performance evaluation, the speech detection algorithm shows within-200 ms start point accuracy of 93%, 99% under 15dB, 20dB, and 25dB signal-to-noise ratio (SNR) environments, respectively and those for the end point are 72%, 89%, and 93% for the corresponding environments, respectively. The classification of speech and nonspeech for the start point detected region of input signal is performed by the pitch information-base method. The percentages of correct classification for speech and nonspeech input are 99% and 90%, respectively. The eight microphone array-based speech enhancement using the DSBF algorithm shows the maximum SNR gaing of 6dB over a single microphone and the error reductin of more than 15% in the spontaneous speech recognition domain.

  • PDF

외국어 발화오류 검출 음성인식기의 성능 개선을 위한 스코어링 기법 (Scoring Methods for Improvement of Speech Recognizer Detecting Mispronunciation of Foreign Language)

  • 강효원;권철홍
    • 대한음성학회지:말소리
    • /
    • 제49호
    • /
    • pp.95-105
    • /
    • 2004
  • An automatic pronunciation correction system provides learners with correction guidelines for each mispronunciation. For this purpose we develope a speech recognizer which automatically classifies pronunciation errors when Koreans speak a foreign language. In order to develope the methods for automatic assessment of pronunciation quality, we propose a language model based score as a machine score in the speech recognizer. Experimental results show that the language model based score had higher correlation with human scores than that obtained using the conventional log-likelihood based score.

  • PDF

음성인식기를 이용한 한국인의 외국어 발화오류 자동 검출 (Automatic Detection of Mispronunciation Using Phoneme Recognition For Foreign Language Instruction)

  • 권철홍;강효원;이상필
    • 대한음성학회지:말소리
    • /
    • 제48호
    • /
    • pp.127-139
    • /
    • 2003
  • An automatic pronunciation correction system provides learners with correction guidelines for each mispronunciation. In this paper we propose an HMM based speech recognizer which automatically classifies pronunciation errors when Korean speak Japanese. For this purpose we also develop phoneme recognizers for Korean and Japanese. Experimental results show that the machine scores of the proposed recognizer correlate with expert ratings well.

  • PDF

후두질환 음성의 자동 식별 성능 비교 (Performance Comparison of Automatic Detection of Laryngeal Diseases by Voice)

  • 강현민;김수미;김유신;김형순;조철우;양병곤;왕수건
    • 대한음성학회지:말소리
    • /
    • 제45호
    • /
    • pp.35-45
    • /
    • 2003
  • Laryngeal diseases cause significant changes in the quality of speech production. Automatic detection of laryngeal diseases by voice is attractive because of its nonintrusive nature. In this paper, we apply speech recognition techniques to detection of laryngeal cancer, and investigate which feature parameters and classification methods are appropriate for this purpose. Linear Predictive Cepstral Coefficients (LPCC) and Mel-Frequency Cepstral Coefficients (MFCC) are examined as feature parameters, and parameters reflecting the periodicity of speech and its perturbation are also considered. As for classifier, multilayer perceptron neural networks and Gaussian Mixture Models (GMM) are employed. According to our experiments, higher order LPCC with the periodic information parameters yields the best performance.

  • PDF

A User friendly Remote Speech Input Unit in Spontaneous Speech Translation System

  • 이광석;김흥준;송진국;추연규
    • 한국정보통신학회:학술대회논문집
    • /
    • 한국해양정보통신학회 2008년도 춘계종합학술대회 A
    • /
    • pp.784-788
    • /
    • 2008
  • In this research, we propose a remote speech input unit, a new method of user-friendly speech input in speech recognition system. We focused the user friendliness on hands-free and microphone independence in speech recognition applications. Our module adopts two algorithms, the automatic speech detection and speech enhancement based on the microphone array-based beamforming method. In the performance evaluation of speech detection, within-200msec accuracy with respect to the manually detected positions is about 97percent under the noise environments of 25dB of the SNR. The microphone array-based speech enhancement using the delay-and-sum beamforming algorithm shows about 6dB of maximum SNR gain over a single microphone and more than 12% of error reduction rate in speech recognition.

  • PDF