• 제목/요약/키워드: Speech and music discrimination

검색결과 24건 처리시간 0.022초

음성/음악 판별을 위한 특징 파라미터와 분류기의 성능비교 (Performance Comparison of Feature Parameters and Classifiers for Speech/Music Discrimination)

  • 김형순;김수미
    • 대한음성학회지:말소리
    • /
    • 제46호
    • /
    • pp.37-50
    • /
    • 2003
  • In this paper, we evaluate and compare the performance of speech/music discrimination based on various feature parameters and classifiers. As for feature parameters, we consider High Zero Crossing Rate Ratio (HZCRR), Low Short Time Energy Ratio (LSTER), Spectral Flux (SF), Line Spectral Pair (LSP) distance, entropy and dynamism. We also examine three classifiers: k Nearest Neighbor (k-NN), Gaussian Mixure Model (GMM), and Hidden Markov Model (HMM). According to our experiments, LSP distance and phoneme-recognizer-based feature set (entropy and dunamism) show good performance, while performance differences due to different classifiers are not significant. When all the six feature parameters are employed, average speech/music discrimination accuracy up to 96.6% is achieved.

  • PDF

멜 켑스트럼 모듈레이션 에너지를 이용한 음성/음악 판별 (Speech/Music Discrimination Using Mel-Cepstrum Modulation Energy)

  • 김봉완;최대림;이용주
    • 대한음성학회지:말소리
    • /
    • 제64호
    • /
    • pp.89-103
    • /
    • 2007
  • In this paper, we introduce mel-cepstrum modulation energy (MCME) for a feature to discriminate speech and music data. MCME is a mel-cepstrum domain extension of modulation energy (ME). MCME is extracted on the time trajectory of Mel-frequency cepstral coefficients, while ME is based on the spectrum. As cepstral coefficients are mutually uncorrelated, we expect the MCME to perform better than the ME. To find out the best modulation frequency for MCME, we perform experiments with 4 Hz to 20 Hz modulation frequency. To show effectiveness of the proposed feature, MCME, we compare the discrimination accuracy with the results obtained from the ME and the cepstral flux.

  • PDF

다차원 MMCD를 이용한 음성/음악 판별 (Speech/Music Discrimination Using Multi-dimensional MMCD)

  • 최무열;송화전;박슬한;김형순
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2006년도 추계학술대회 발표논문집
    • /
    • pp.142-145
    • /
    • 2006
  • Discrimination between speech and music is important in many multimedia applications. Previously we proposed a new parameter for speech/music discrimination, the mean of minimum cepstral distances (MMCD), and it outperformed the conventional parameters. One weakness of it is that its performance depends on range of candidate frames to compute the minimum cepstral distance, which requires the optimal selection of the range experimentally. In this paper, to alleviate the problem, we propose a multi-dimensional MMCD parameter which consists of multiple MMCDs with different ranges of candidate frames. Experimental results show that the multi-dimensional MMCD parameter yields an error rate reduction of 22.5% compared with the optimally chosen one-dimensional MMCD parameter.

  • PDF

Effect of Digital Noise Reduction of Hearing Aids on Music and Speech Perception

  • Kim, Hyo Jeong;Lee, Jae Hee;Shim, Hyun Joon
    • Journal of Audiology & Otology
    • /
    • 제24권4호
    • /
    • pp.180-190
    • /
    • 2020
  • Background and Objectives: Although many studies have evaluated the effect of the digital noise reduction (DNR) algorithm of hearing aids (HAs) on speech recognition, there are few studies on the effect of DNR on music perception. Therefore, we aimed to evaluate the effect of DNR on music, in addition to speech perception, using objective and subjective measurements. Subjects and Methods: Sixteen HA users participated in this study (58.00±10.44 years; 3 males and 13 females). The objective assessment of speech and music perception was based on the Korean version of the Clinical Assessment of Music Perception test and word and sentence recognition scores. Meanwhile, for the subjective assessment, the quality rating of speech and music as well as self-reported HA benefits were evaluated. Results: There was no improvement conferred with DNR of HAs on the objective assessment tests of speech and music perception. The pitch discrimination at 262 Hz in the DNR-off condition was better than that in the unaided condition (p=0.024); however, the unaided condition and the DNR-on conditions did not differ. In the Korean music background questionnaire, responses regarding ease of communication were better in the DNR-on condition than in the DNR-off condition (p=0.029). Conclusions: Speech and music perception or sound quality did not improve with the activation of DNR. However, DNR positively influenced the listener's subjective listening comfort. The DNR-off condition in HAs may be beneficial for pitch discrimination at some frequencies.

Effect of Digital Noise Reduction of Hearing Aids on Music and Speech Perception

  • Kim, Hyo Jeong;Lee, Jae Hee;Shim, Hyun Joon
    • 대한청각학회지
    • /
    • 제24권4호
    • /
    • pp.180-190
    • /
    • 2020
  • Background and Objectives: Although many studies have evaluated the effect of the digital noise reduction (DNR) algorithm of hearing aids (HAs) on speech recognition, there are few studies on the effect of DNR on music perception. Therefore, we aimed to evaluate the effect of DNR on music, in addition to speech perception, using objective and subjective measurements. Subjects and Methods: Sixteen HA users participated in this study (58.00±10.44 years; 3 males and 13 females). The objective assessment of speech and music perception was based on the Korean version of the Clinical Assessment of Music Perception test and word and sentence recognition scores. Meanwhile, for the subjective assessment, the quality rating of speech and music as well as self-reported HA benefits were evaluated. Results: There was no improvement conferred with DNR of HAs on the objective assessment tests of speech and music perception. The pitch discrimination at 262 Hz in the DNR-off condition was better than that in the unaided condition (p=0.024); however, the unaided condition and the DNR-on conditions did not differ. In the Korean music background questionnaire, responses regarding ease of communication were better in the DNR-on condition than in the DNR-off condition (p=0.029). Conclusions: Speech and music perception or sound quality did not improve with the activation of DNR. However, DNR positively influenced the listener's subjective listening comfort. The DNR-off condition in HAs may be beneficial for pitch discrimination at some frequencies.

음성과 음악 분류를 위한 특징 파라미터와 분류 방법의 성능비교 (Performance Comparison of Feature Parameters and Classifiers for Speech/Music Discrimination)

  • 김수미;김형순
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2003년도 5월 학술대회지
    • /
    • pp.149-152
    • /
    • 2003
  • In this paper, we present a performance comparison of feature parameters and classifiers for speech/music discrimination. Experiments were carried out on six feature parameters and three classifiers. It turns out that three classifiers shows similar performance. The feature set that captures the temporal and spectral structure of the signal yields good performance, while the phone-based feature set shows relatively inferior performance.

  • PDF

Enhancement of Processing Capabilities of Hippocampus Lobe: A P300 Based Event Related Potential Study

  • Benet, Neelesh;Krishna, Rajalakshmi;Kumar, Vijay
    • 대한청각학회지
    • /
    • 제25권3호
    • /
    • pp.119-123
    • /
    • 2021
  • Background and Objectives: The influence of music training on different areas of the brain has been extensively researched, but the underlying neurobehavioral mechanisms remain unknown. In the present study, the effects of training for more than three years in Carnatic music (an Indian form of music) on the discrimination ability of different areas of the brain were tested using P300 analysis at three electrode placement sites. Subjects and Methods: A total of 27 individuals, including 13 singers aged 16-30 years (mean±standard deviation, 23±3.2 years) and 14 non-singers aged 16-30 years (mean age, 24±2.9 years), participated in this study. The singers had 3-5 years of formal training experience in Carnatic music. Cortical activities in areas corresponding to attention, discrimination, and memory were tested using P300 analysis, and the tests were performed using the Intelligent Hearing System. Results: The mean P300 amplitude of the singers at the Fz electrode placement site (5.64±1.81) was significantly higher than that of the non-singers (3.85±1.60; t(25)=3.3, p<0.05). The amplitude at the Cz electrode placement site in singers (5.90±2.18) was significantly higher than that in non-singers (3.46±1.40; t(25)=3.3, p<0.05). The amplitude at the Pz electrode placement site in singers (4.94±1.89) was significantly higher than that in non-singers (3.57±1.50; t(25)=3.3, p<0.05). Among singers, the mean P300 amplitude was significantly higher in the Cz site than the other placement sites, and among non-singers, the mean P300 amplitude was significantly higher in the Fz site than the other placement sites, i.e., music training facilitated enhancement of the P300 amplitude at the Cz site. Conclusions: The findings of this study suggest that more than three years of training in Carnatic singing can enhance neural coding to discriminate subtle differences, leading to enhanced discrimination abilities of the brain, mainly in the generation site corresponding to Cz electrode placement.

Enhancement of Processing Capabilities of Hippocampus Lobe: A P300 Based Event Related Potential Study

  • Benet, Neelesh;Krishna, Rajalakshmi;Kumar, Vijay
    • Journal of Audiology & Otology
    • /
    • 제25권3호
    • /
    • pp.119-123
    • /
    • 2021
  • Background and Objectives: The influence of music training on different areas of the brain has been extensively researched, but the underlying neurobehavioral mechanisms remain unknown. In the present study, the effects of training for more than three years in Carnatic music (an Indian form of music) on the discrimination ability of different areas of the brain were tested using P300 analysis at three electrode placement sites. Subjects and Methods: A total of 27 individuals, including 13 singers aged 16-30 years (mean±standard deviation, 23±3.2 years) and 14 non-singers aged 16-30 years (mean age, 24±2.9 years), participated in this study. The singers had 3-5 years of formal training experience in Carnatic music. Cortical activities in areas corresponding to attention, discrimination, and memory were tested using P300 analysis, and the tests were performed using the Intelligent Hearing System. Results: The mean P300 amplitude of the singers at the Fz electrode placement site (5.64±1.81) was significantly higher than that of the non-singers (3.85±1.60; t(25)=3.3, p<0.05). The amplitude at the Cz electrode placement site in singers (5.90±2.18) was significantly higher than that in non-singers (3.46±1.40; t(25)=3.3, p<0.05). The amplitude at the Pz electrode placement site in singers (4.94±1.89) was significantly higher than that in non-singers (3.57±1.50; t(25)=3.3, p<0.05). Among singers, the mean P300 amplitude was significantly higher in the Cz site than the other placement sites, and among non-singers, the mean P300 amplitude was significantly higher in the Fz site than the other placement sites, i.e., music training facilitated enhancement of the P300 amplitude at the Cz site. Conclusions: The findings of this study suggest that more than three years of training in Carnatic singing can enhance neural coding to discriminate subtle differences, leading to enhanced discrimination abilities of the brain, mainly in the generation site corresponding to Cz electrode placement.

FM 라디오 환경에서의 실시간 음악 판별 시스템 구현 (Implementation of Music Signals Discrimination System for FM Broadcasting)

  • 강현우
    • 정보처리학회논문지B
    • /
    • 제16B권2호
    • /
    • pp.151-156
    • /
    • 2009
  • 본 연구에서는 GMM 기반의 음성/음악 판별 방법을 응용하여 FM 라디오 방송에서 순수한 음악 구간만을 판별하는 시스템을 구현하였다. 본 시스템에서는 음성, 음악, 광고 음악, 기타 여러 가지 사운드가 혼합되어 있는 오디오 방송 프로그램에서 순수한 음악만을 판별하여 자동으로 저장하고자 한다. 음악의 시작 부분과 끝 부분을 보다 정교하게 검출하고자 순수한 음악으로 판별된 구간의 시작 부분과 끝 부분에 대해 후처리 과정을 추가하였다. PC 환경에서 FM 라디오 방송을 이용하여 구현된 시스템을 실시간으로 테스트한 결과 우수한 성능을 보임을 확인하였다. 또한 SoC 구현을 고려하여 고정소수점 연산을 수행한 결과 3MIPS 이하의 적은 연산량으로 부동소수점 연산일 때와 동일한 결과를 얻을수 있었다.

FM 방송 중 블록 단위 음성 음악 판별 시스템의 설계 및 구현 (Design and Implementation of Speech Music Discrimination System per Block Unit on FM Radio Broadcast)

  • 장현종;엄정권;임준식
    • 한국지능시스템학회:학술대회논문집
    • /
    • 한국지능시스템학회 2007년도 추계학술대회 학술발표 논문집
    • /
    • pp.25-28
    • /
    • 2007
  • 본 논문은 FM 라디오 방송의 오디오 신호를 블록 단위로 음성 음악을 판별하는 시스템을 제안하는 논문이다. 본 논문에서는 음성 음악 판별 시스템을 구축하기 위해 다양한 특정 파라미터와 분류 알고리즘을 제안 한다. 특정 파라미터는 신호처리 분야(Centroid, Rolloff, Flux, ZCR, Low Energy), 음성 인식 분야(LPC, MFCC), 음악 분석 분야(MPitch, Beat)에서 각각 사용되는 파라미터를 사용하였으며 분류 알고리즘으로는 패턴인식 분야(GMM, KNN, BP)와 퍼지 신경망(ANFIS)을 사용하였고, 거리 구현은 Mahalanobis 거리를 사용하였다.

  • PDF