• Title/Summary/Keyword: Sound Segmentation

Search Result 29, Processing Time 0.02 seconds

Performance Comparison Between the Envelope Peak Detection Method and the HMM Based Method for Heart Sound Segmentation

  • Jang, Hyun-Baek;Chung, Young-Joo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.2E
    • /
    • pp.72-78
    • /
    • 2009
  • Heart sound segmentation into its components, S1, systole, S2 and diastole is the first step of analysis and the most important part in the automatic diagnosis of heart sounds. Conventionally, the Shannon energy envelope peak detection method has been popularly used due to its superior performance in locating S1 and S2. Recently, the HMM has been shown to be quite suitable in modeling the heart sound signal and its use in segmenting the heart sound signal has been suggested with some success. In this paper, we compared the two methods for heart sound segmentation using a common database. Experimental tests carried out on the 4 different types of heart sound signals showed that the segmentation accuracy relative to the manual segmentation was 97.4% in the HMM based method which was larger than 91.5% in the peak detection method.

A Method of Sound Segmentation in Time-Frequency Domain Using Peaks and Valleys in Spectrogram for Speech Separation (음성 분리를 위한 스펙트로그램의 마루와 골을 이용한 시간-주파수 공간에서 소리 분할 기법)

  • Lim, Sung-Kil;Lee, Hyon-Soo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.27 no.8
    • /
    • pp.418-426
    • /
    • 2008
  • In this paper, we propose an algorithm for the frequency channel segmentation using peaks and valleys in spectrogram. The frequency channel segments means that local groups of channels in frequency domain that could be arisen from the same sound source. The proposed algorithm is based on the smoothed spectrum of the input sound. Peaks and valleys in the smoothed spectrum are used to determine centers and boundaries of segments, respectively. To evaluate a suitableness of the proposed segmentation algorithm before that the grouping stage is applied, we compare the synthesized results using ideal mask with that of proposed algorithm. Simulations are performed with mixed speech signals with narrow band noises, wide band noises and other speech signals.

Application of Speech Recognition with Closed Caption for Content-Based Video Segmentations

  • Son, Jong-Mok;Bae, Keun-Sung
    • Speech Sciences
    • /
    • v.12 no.1
    • /
    • pp.135-142
    • /
    • 2005
  • An important aspect of video indexing is the ability to segment video into meaningful segments, i.e., content-based video segmentation. Since the audio signal in the sound track is synchronized with image sequences in the video program, a speech signal in the sound track can be used to segment video into meaningful segments. In this paper, we propose a new approach to content-based video segmentation. This approach uses closed caption to construct a recognition network for speech recognition. Accurate time information for video segmentation is then obtained from the speech recognition process. For the video segmentation experiment for TV news programs, we made 56 video summaries successfully from 57 TV news stories. It demonstrates that the proposed scheme is very promising for content-based video segmentation.

  • PDF

Automatic Classification of Continuous Heart Sound Signals Using the Statistical Modeling Approach (통계적 모델링 기법을 이용한 연속심음신호의 자동분류에 관한 연구)

  • Kim, Hee-Keun;Chung, Yong-Joo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.26 no.4
    • /
    • pp.144-152
    • /
    • 2007
  • Conventional research works on the classification of the heart sound signal have been done mainly with the artificial neural networks. But the analysis results on the statistical characteristic of the heart sound signal have shown that the HMM is suitable for modeling the heart sound signal. In this paper, we model the various heart sound signals representing different heart diseases with the HMM and find that the classification rate is much affected by the clustering of the heart sound signal. Also, the heart sound signal acquired in real environments is a continuous signal without any specified starting and ending points of time. Hence, for the classification based on the HMM, the continuous cyclic heart sound signal needs to be manually segmented to obtain isolated cycles of the signal. As the manual segmentation will incur the errors in the segmentation and will not be adequate for real time processing, we propose a variant of the ergodic HMM which does not need segmentation procedures. Simulation results show that the proposed method successfully classifies continuous heart sounds with high accuracy.

Performance Improvement of Cardiac Disorder Classification Based on Automatic Segmentation and Extreme Learning Machine (자동 분할과 ELM을 이용한 심장질환 분류 성능 개선)

  • Kwak, Chul;Kwon, Oh-Wook
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.1
    • /
    • pp.32-43
    • /
    • 2009
  • In this paper, we improve the performance of cardiac disorder classification by continuous heart sound signals using automatic segmentation and extreme learning machine (ELM). The accuracy of the conventional cardiac disorder classification systems degrades because murmurs and click sounds contained in the abnormal heart sound signals cause incorrect or missing starting points of the first (S1) and the second heart pulses (S2) in the automatic segmentation stage, In order to reduce the performance degradation due to segmentation errors, we find the positions of the S1 and S2 pulses, modify them using the time difference of S1 or S2, and extract a single period of heart sound signals. We then obtain a feature vector consisting of the mel-scaled filter bank energy coefficients and the envelope of uniform-sized sub-segments from the single-period heart sound signals. To classify the heart disorders, we use ELM with a single hidden layer. In cardiac disorder classification experiments with 9 cardiac disorder categories, the proposed method shows the classification accuracy of 81.6% and achieves the highest classification accuracy among ELM, multi-layer perceptron (MLP), support vector machine (SVM), and hidden Markov model (HMM).

Effects of the Orthographic Representation on Speech Sound Segmentation in Children Aged 5-6 Years (5~6세 아동의 철자표상이 말소리분절 과제 수행에 미치는 영향)

  • Maeng, Hyeon-Su;Ha, Ji-Wan
    • Journal of Digital Convergence
    • /
    • v.14 no.6
    • /
    • pp.499-511
    • /
    • 2016
  • The aim of this study was to find out effect of the orthographic representation on speech sound segmentation performance. Children's performances of the orthographic representation task and the speech sound segmentation task had positive correlation in words of phoneme-grapheme correspondence and negative correlation in words of phoneme-grapheme non-correspondence. In the case of words of phoneme-grapheme correspondence, there was no difference in performance ability between orthographic representation high level group and low level group, while in the case of words of phoneme-grapheme non-correspondence, the low level group's performance was significantly better than the high level group's. The most frequent errors of both groups were orthographic conversion errors and such errors were significantly more noticeable in the high level group. This study suggests that from the time of learning orthographic knowledge, children utilize orthographic knowledge for the performance of phonological awareness tasks.

A Study on the Indoor Sound-field Analysis by Adaptive Triangular Beam Method (적응 삼각형 빔 방법에 의한 실내음장 해석)

  • 조대승;성상경;김진형;최재호;박일권
    • Transactions of the Korean Society for Noise and Vibration Engineering
    • /
    • v.13 no.3
    • /
    • pp.217-224
    • /
    • 2003
  • In this study, the adaptive triangular beam method(ATBM) considering different sound reflection coefficients and angles of a triangular beam on two or more planes as well as diffraction effect is suggested. The ATBM, subdividing a tracing triangular beam into multiple triangular beams on reflection planes, gives reliable and convergent sound-field analysis results without the dependancy on the number of initial triangular beam segmentation to search sound propagation paths from source to receiver. The validity of the method is verified by the comparison of numerical and experimental results for energy decay curve and steady-state sound pressure level of rooms having direct, reflective and diffractive sound paths.

Dilated convolution and gated linear unit based sound event detection and tagging algorithm using weak label (약한 레이블을 이용한 확장 합성곱 신경망과 게이트 선형 유닛 기반 음향 이벤트 검출 및 태깅 알고리즘)

  • Park, Chungho;Kim, Donghyun;Ko, Hanseok
    • The Journal of the Acoustical Society of Korea
    • /
    • v.39 no.5
    • /
    • pp.414-423
    • /
    • 2020
  • In this paper, we propose a Dilated Convolution Gate Linear Unit (DCGLU) to mitigate the lack of sparsity and small receptive field problems caused by the segmentation map extraction process in sound event detection with weak labels. In the advent of deep learning framework, segmentation map extraction approaches have shown improved performance in noisy environments. However, these methods are forced to maintain the size of the feature map to extract the segmentation map as the model would be constructed without a pooling operation. As a result, the performance of these methods is deteriorated with a lack of sparsity and a small receptive field. To mitigate these problems, we utilize GLU to control the flow of information and Dilated Convolutional Neural Networks (DCNNs) to increase the receptive field without additional learning parameters. For the performance evaluation, we employ a URBAN-SED and self-organized bird sound dataset. The relevant experiments show that our proposed DCGLU model outperforms over other baselines. In particular, our method is shown to exhibit robustness against nature sound noises with three Signal to Noise Ratio (SNR) levels (20 dB, 10 dB and 0 dB).

A Harmony in Language and Music (언어와 음악의 상관관계 고찰을 위한 연구)

  • 이재강
    • Lingua Humanitatis
    • /
    • v.2 no.1
    • /
    • pp.287-301
    • /
    • 2002
  • Either in music or in language, sound plays its role by taking up the fixed multi-spaces in one's consciousness. Music space differs from auditory space whose aim Is to perceive the positions and identities of the outer things. While auditory space is based on the interests of the outer things, music space is based on the indifference. We discuss the notion of space because it is where symbols reside. Categorial perception about the phonemic restoration describes the ability of a listener how to use his own intelligence to acknowledge and fill the missing points; however, musical perception can be explained as a positive regression to avoid colloquial logic and danger of segmentation in the course of auditory experience and phonation acquisition by an infant. About the question on the difference of the listening to the language sound and other sound, auditory mechanism proceeds language sound the same as other types of sound. But there are another theories which claim that brain proceeds the farmer differently from the latter. The function of music has not been discovered as clear as that of language; music has much more meanings in comparison with language.

  • PDF

Detection of Main Components of Heart Sound Using Third Moment Characteristics of PCG Envelope (심음 포락선의 3차 모멘트를 이용한 심음의 주성분 검출)

  • Quan, Xing-Ri;Bae, Keun-Sung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.17 no.12
    • /
    • pp.3001-3008
    • /
    • 2013
  • To diagnose the cardiac valve abnormalities using analysis of phonocardiogram, first of all, accurate detection of S1, S2 components is needed for heart sound segmentation. In this paper, a new method that uses the third moment characteristics of an envelope of the PCG is proposed for accurate detection of S1 and S2 components of the heart sound with cardiac murmurs. The envelope of the PCG is obtained from the short-time energy profile, and its third moment profile with slope information is used for accurate time gating of the S1, S2 components. Experimental results have shown that the proposed method is superior to the conventional second moment method for detection of S1 and S2 regions from the heart sound signals with cardiac murmurs.