Search | Korea Science

Performance Comparison of Classification Algorithms in Music Recognition using Violin and Cello Sound Files (바이올린과 첼로 연주 데이터를 이용한 분류 알고리즘의 성능 비교)

Kim Jae Chun;Kwak Kyung sup
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.30 no.5C
- /
- pp.305-312
- /
- 2005
Three classification algorithms are tested using musical instruments. Several classification algorithms are introduced and among them, Bayes rule, NN and k-NN performances evaluated. ZCR, mean, variance and average peak level feature vectors are extracted from instruments sample file and used as data set to classification system. Used musical instruments are Violin, baroque violin and baroque cello. Results of experiment show that the performance of NN algorithm excels other algorithms in musical instruments classification.
PDF KSCI

Design of Voice Activity Detection Algorithm for Variable Rate Speech Coders (가변전송률 음성부호화기 적용을 위한 음성활성도 측정 알고리즘 설계)

김재원
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.26 no.9A
- /
- pp.1451-1458
- /
- 2001
디지털 이동통신 시스템에서 가장 빈번하게 발생하는 음성 서비스의 궁극적인 목표는 양호한 음성 품질과 높은 주파수 효율의 제공에 있다. 음성은 묵음 구간에 의하여 구분되어진 짧고 간헐적인 음성 에너지의 반복으로 표현 가능하며 실제 음성 통화중 활성 음성이 존재하는 구간은 약 40%, 나머지 60% 구간은 묵음 또는 상대방의 음성을 듣는 구간이다. 이 묵음 구간을 효율적으로 활용함에 의해 시스템의 스펙트럼 이득을 얻을 수 있다. 본 논문에서는 디지털 이동통신 시스템과 같이 다양하게 변화하는 주변 잡음 환경에서도 강건하게 동작 가능하여 10msec 프레임 크기를 갖는 음성부호화기에 적용 가능한 음성 활성도 측정 방안을 설계하였다. 설계된 알고리즘은 음성에너지, 스펙트럼 분포, 영교차율, 그리고 LPC 잔여신호의 Peakiness 측정값을 이용하였다.
PDF

A study on real-time implementation of speech recognition and speech control system using dSPACE board (dSPACE 보드를 이용한 음성인식 명령처리시스템 실시간 구현에 관한 연구)

김재웅;정원용
- Proceedings of the Korea Institute of Convergence Signal Processing
- /
- 2000.12a
- /
- pp.173-176
- /
- 2000
음성은 인간이 가진 가장 편리한 제어전송수단으로 이를 통한 제어는 인간에게 많은 편리함을 제공할 것이다. 본 논문에서는 다층구조 신경망(Multi-Layer Perceptron)을 이용하여 간단한 음성인식 명령처리시스템을 Matlab 상에서 구성해 보았다. 음성인식을 통한 제어의 목적을 위해 화자종속, 고립단어인식기를 목표로 설정하여 연구를 수행하였다. 음성의 시작점과 끝점을 검출하기 위해 단구간 에너지와 영교차율(ZCR)을 이용하였고 인식기의 특징파라미터로는 12차 LPC켑스트럼 계수를 사용하였다. 그리고 신경망의 출력값을 기동, 정지시에 활성화되도록 3개의 계층으로 하였고, 신경망의 뉴런의 개수를 각각 12, 12, 2으로 설정하였다. 먼저 기준음성패턴으로 학습시킨 후에 Matlab 환경하에 동작하는 dSPACE 실시간처리보드에 변환된 C프로그램을 다운로드하고, 음성을 입력하여 인식 후 dSPACE보드의 D/A컨버터의 출력단에 연결된 DC모터를 기동, 정지제어를 수행하였다. 실시간 음성인식 명령처리 시스템 구현을 통하여 원격제어와 같은 음성명령을 통한 제어가 가능함을 확인할 수 있었다.
PDF

Identification of Underwater Ambient Noise Sources Using Hilbert-Huang Transfer (힐버트-후앙 변환을 이용한 수중소음원의 식별)

Hwang, Do-Jin;Kim, Jea-Soo
- Journal of Ocean Engineering and Technology
- /
- v.22 no.1
- /
- pp.30-36
- /
- 2008
Underwater ambient noise originating from geophysical, biological, and man-made acoustic sources contains information on the source and the ocean environment. Such noise affectsthe performance of sonar equipment. In this paper, three steps are used to identify the ambient noise source, detection, feature extraction, and similarity measurement. First, we use the zero-crossing rate to detect the ambient noisesource from background noise. Then, a set of feature vectors is proposed forthe ambient noise source using the Hilbert-Huang transform and the Karhunen-Loeve transform. Finally, the Euclidean distance is used to measure the similarity between the standard feature vector and the feature vector of the unknown ambient noise source. The developed algorithm is applied to the observed ocean data, and the results are presented and discussed.
PDF KSCI

Coding Method of Variable Threshold Dual Rate ADPCM Speech Considering the Background Noise (배경 잡음환경에서 가변 임계값에 의한 Dual Rate ADPCM 음성 부호화 기법)

한경호
- Journal of the Korean Institute of Illuminating and Electrical Installation Engineers
- /
- v.17 no.6
- /
- pp.154-159
- /
- 2003
In this paper, we proposed variable threshold dual rate ADPCM coding method which adapts two coding rates of the standard ADPCM of ITU G.726 for speech quality improvement at a comparably low coding rates. The ZCR(Zero Crossing Rate) is computed for speecd data and under the noisy environment, noise data dominant region showed higher ZCR and speech data dominant region showed lower ZCR. The speech data with the higher ZCR is encoded by low coding rate for reduced coded data and the speech data with the lower ZCR is encoded by high coding rate for speech quality improvements. For coded data, 2 bits are assigned for low coding rate of 16[Kbps] and 5 bits are is assigned for high coding rate of 40[Kbps]. Through the simulation, the proposed idea is evaluated and shown that the variable dual rate ADPCM coding technique shows the qood speech quality at low coding rate.
https://doi.org/10.5207/JIEIE.2003.17.6.154 인용 PDF KSCI

Analysis of Galvanic Skin Response Signal for High-Arousal Negative Emotion Using Discrete Wavelet Transform (이산 웨이브렛 변환을 이용한 고각성 부정 감성의 GSR 신호 분석)

Lim, Hyun-Jun;Yoo, Sun-Kook;Jang, Won Seuk
- Science of Emotion and Sensibility
- /
- v.20 no.3
- /
- pp.13-22
- /
- 2017
Emotion has a direct influence such as decision-making, perception, etc. and plays an important role in human life. For the convenient and accurate recognition of high-arousal negative emotion, the purpose of this paper is to design an algorithm for analysis using the bio-signal. In this study, after two emotional induction using the 'normal' / 'fear' emotion types of videos, we measured the Galvanic Skin Response (GSR) signal which is the simple of bio-signals. Then, by decomposing Tonic component and Phasic component in the measured GSR and decomposing Skin Conductance Very Slow Response (SCVSR) and Skin Conductance Slow Response (SCSR) in the Phasic component associated with emotional stimulation, extracting the major features of the components for an accurate analysis, we used a discrete wavelet transform with excellent time-frequency localization characteristics, not the method used previously. The extracted features are maximum value of Phasic component, amplitude of Phasic component, zero crossing rate of SCVSR and zero crossing rate of SCSR for distinguishing high-arousal negative emotion. As results, the case of high-arousal negative emotion exhibited higher value than the case of low-arousal normal emotion in all 4 of the features, and the more significant difference between the two emotion was found statistically than the previous analysis method. Accordingly, the results of this study indicate that the GSR may be a useful indicator for a high-arousal negative emotion measurement and contribute to the development of the emotional real-time rating system using the GSR.
https://doi.org/10.14695/KJSOS.2017.20.3.13 인용 PDF KSCI

Wireless Digital Stethoscope Diagnosis System using Heart Rate (심박수를 이용한 무선 디지털 청진 진단시스템)

Park, Kee-Young;Lee, Jong-Ha;Cho, Sook-Jin;Lee, Chul-Hee;Jung, Eui-Bung
- Journal of the Institute of Electronics and Information Engineers
- /
- v.51 no.6
- /
- pp.237-243
- /
- 2014
Heart sounds of patient's chest could be heard using an analog stethoscope. However, auscultation of a heart sound can be diagnosed differently by each doctor hearing it. Therefore the condition of each patient is determined by the subjective comments based on the hearing ability of a physician who has years of experience. In this paper, through analysis of heart sound and heart rate of the patient's condition, we will define minutely how to diagnose the condition of patient using a wireless digital stethoscope diagnostic system. And it is possible to perform an objective medical diagnosis by applying LCR (Level Crossing Rate) and to show the relationship of a disease using this system.
https://doi.org/10.5573/ieie.2014.51.6.237 인용 PDF KSCI

A Study on the Endpoint Detection Algorithm Based on a Modified Teager Energy (변형된 Teager 에너지에 기초한 음성끝점검출 알고리듬에 관한 연구)

이재한
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1998.06e
- /
- pp.407-410
- /
- 1998
본 논문에서는 변형된 Teager 에너지를 이용하여 음성의 끝점을 검출하는 알고리듬을 제안하였다. 기존의 방법에서는 대부분 음성신호의 에너지와 영교차율을 이용하거나 이 파라미터들과 함께 다른 여러 파라미터들을 사용하여 끝점을 검출하였다. 여러 파라미터들을 사용하는 알고리듬의 경우 계산량이 많아지게 되는데, 이에 비해 본 논문에서는 하나의 파라미터를 이용하기 때문에 계산량이 기존의 알고리듬보다 적다. 그리고 이 알고리듬에서 사용한 변형된 Teager 에너지는 음성신호의 진폭뿐만 아니라 주파수까지 고려한 파라미터이다. 일반적으로 마찰음은 진폭이 작아 검출하기가 어려운데, 본 논문에서는 이러한 마찰음에 대해 실험을 했고, 그 결과를 통해 제안한 알고리듬이 기존의 다른 여러 알고리듬보다 성능이 우수하다는 것을 확인할 수 있었다.
PDF

A Study on the Word Recognition of Korean Speech using Neural Network- A study on the initial consonant Recognition using composite Neural Network (신경망을 이용한 우리말 음성의 인식에 관한 연구 - 복합 신경망을 이용한 초성자음 인식에 관한 연구)

Kim, Suk-Dong;Lee, Haing-Sei
- The Journal of the Acoustical Society of Korea
- /
- v.11 no.3
- /
- pp.14-24
- /
- 1992
This paper is a study on the consonant recognition using neural network. First, the part of consonant was separated from the sound of vowel and consonant by the use of acoustic parameter. The rate of length vs. zero crossing rate in the sound of consonant had been studied by dividing each consonant into several groups. Finally, for the purpose of consonant recognition, the composite neural network which consists of a control network and several sub-network is proposed. The control network identifies the group to which the input consonant belongs and the sub-network recognizes the consonant in each group.
PDF

A Study on the Synthesis of Korean Speech by Formant VOCODER (포르만트 VOCODER에 의한 한국어 음성합성에 관한 연구)

허강인;이대영
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.14 no.6
- /
- pp.699-712
- /
- 1989
This paper describes a method of Korean speech synhes is using format VOCODER. The parameters of speech synthes is are a follows, 1) format F1, F2, and F3 by spectrum moment method and F4, F5 using the length of vocal tract. 2) pitch frequencies obtained by optimu, Comb method using AMDF. 3) short time average energy and short time mean amplitude. 4) The decision method of bandwidth reportd by Fant. 5) voicde/unvoiced discrimination using zerocrossing. 6) excitation wave reported by Rosenberg. 7) gaussian white noise. Synthesis results are in fairly good agreement with original speech.
PDF

Search Result 59, Processing Time 0.021 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)