통합 검색 | Korea Science

오디오 부호화기를 위한 스펙트럼 변화 및 MFCC 기반 음성/음악 신호 분류 (Speech/Music Signal Classification Based on Spectrum Flux and MFCC For Audio Coder)

이상길;이인성
- 한국정보전자통신기술학회논문지
- /
- 제16권5호
- /
- pp.239-246
- /
- 2023
본 논문에서는 오디오 부호화기를 위한 스펙트럼 변화 파라미터와 Mel Frequency Cepstral Coefficients(MFCC) 파라미터를 이용하여 음성과 음악 신호를 분류하는 개루프 방식의 알고리즘을 제안한다. 반응성을 높이기 위해 단구간 특징 파라미터로 MFCC를 사용하고 정확도를 높이기 위해 장구간 특징 파라미터로 스펙트럼 변화를 사용하였다. 전체적인 음성/음악 신호 분류 결정은 단구간 분류와 장구간 분류를 결합하여 이루어진다. 패턴인식을 위해 Gaussian Mixed Model(GMM)을 사용하였고, Expectation Maximization(EM) 알고리즘을 사용하여 최적의 GMM 파라미터를 추출하였다. 제안된 장단구간 결합 음성/음악 신호 분류 방법은 다양한 오디오 음원에서 평균적으로 1.5% 분류 오류율을 보였고 단구간 단독 분류 방법 보다 0.9%, 장구간 단독 분류 방법보다 0.6%의 분류 오류율의 성능 개선을 이룰 수 있었다. 제안된 장단구간 결합 음성/음악 신호 분류 방법은 USAC 오디오 분류 방법보다 타악기 음악 신호에서 9.1% 분류 오류율, 음성신호에서 5.8% 분류 오류율의 성능 개선을 이룰 수 있었다.
https://doi.org/10.17661/jkiiect.2023.16.5.239 인용 PDF HTML

Emotion Recognition Based on Frequency Analysis of Speech Signal

Sim, Kwee-Bo;Park, Chang-Hyun;Lee, Dong-Wook;Joo, Young-Hoon
- International Journal of Fuzzy Logic and Intelligent Systems
- /
- 제2권2호
- /
- pp.122-126
- /
- 2002
In this study, we find features of 3 emotions (Happiness, Angry, Surprise) as the fundamental research of emotion recognition. Speech signal with emotion has several elements. That is, voice quality, pitch, formant, speech speed, etc. Until now, most researchers have used the change of pitch or Short-time average power envelope or Mel based speech power coefficients. Of course, pitch is very efficient and informative feature. Thus we used it in this study. As pitch is very sensitive to a delicate emotion, it changes easily whenever a man is at different emotional state. Therefore, we can find the pitch is changed steeply or changed with gentle slope or not changed. And, this paper extracts formant features from speech signal with emotion. Each vowels show that each formant has similar position without big difference. Based on this fact, in the pleasure case, we extract features of laughter. And, with that, we separate laughing for easy work. Also, we find those far the angry and surprise.
https://doi.org/10.5391/IJFIS.2002.2.2.122 인용 PDF KSCI

W-CDMA 시스템을 위한 가변율 음성코덱 설계 (Design of a variable rate speech codec for the W-CDMA system)

정우성
- 한국음향학회:학술대회논문집
- /
- 한국음향학회 1998년도 제15회 음성통신 및 신호처리 워크샵(KSCSP 98 15권1호)
- /
- pp.142-147
- /
- 1998
Recently, 8 kb/s CS-ACELP coder of G.729 is atandardized by ITU-T SG15 and it has been reported that the speech quality of G729 is better than or equal to that of 32kb/s ADPCM. However G.729 is the fixed rate speech coder, and it does not consider the property of voice activity in mutual conversation. If we use the voice activity, we can reduce the average bit rate in half without any degradations of the speech quality. In this paper, we propose an efficient variable rate algorithm for G.729. The variable rate algorithm consists of two main subjects, the rate determination algorithm and algorithm, we combine the energy-thresholding method, the phonetic segmentation method by integration of various feature parameters obtained through the analysis procedure, and the variable hangover period method. Through the analysis of noise features, the 1 kb/s sub rate coder is designed for coding the background noise signal. So, we design the 4 kb/s sub rate coder for the unvoiced parts. The performance of the variable rate algorithm is evaluated by the comparison of speed quality and average bit rate with G.729. Subjective quality test is also done by MOS test. Conclusively, it is verified that the proposed variable rate CS-ACELP coder produced the same speech quality as G.729, at the average bit rate of 4.4 kb/s.
PDF

DSP를 이용한 자동차 소음에 강인한 음성인식기 구현 (Implementation of a Robust Speech Recognizer in Noisy Car Environment Using a DSP)

정익주
- 음성과학
- /
- 제15권2호
- /
- pp.67-77
- /
- 2008
In this paper, we implemented a robust speech recognizer using the TMS320VC33 DSP. For this implementation, we had built speech and noise database suitable for the recognizer using spectral subtraction method for noise removal. The recognizer has an explicit structure in aspect that a speech signal is enhanced through spectral subtraction before endpoints detection and feature extraction. This helps make the operation of the recognizer clear and build HMM models which give minimum model-mismatch. Since the recognizer was developed for the purpose of controlling car facilities and voice dialing, it has two recognition engines, speaker independent one for controlling car facilities and speaker dependent one for voice dialing. We adopted a conventional DTW algorithm for the latter and a continuous HMM for the former. Though various off-line recognition test, we made a selection of optimal conditions of several recognition parameters for a resource-limited embedded recognizer, which led to HMM models of the three mixtures per state. The car noise added speech database is enhanced using spectral subtraction before HMM parameter estimation for reducing model-mismatch caused by nonlinear distortion from spectral subtraction. The hardware module developed includes a microcontroller for host interface which processes the protocol between the DSP and a host.
PDF

분리된 보컬을 활용한 음색기반 음악 특성 탐색 연구 (Investigation of Timbre-related Music Feature Learning using Separated Vocal Signals)

이승진
- 방송공학회논문지
- /
- 제24권6호
- /
- pp.1024-1034
- /
- 2019
음악에 대한 선호도는 다양한 요소들에 의해 결정되며, 추천의 이유를 보여주는 특성을 발굴하는 것은 음악 추천에 있어 중요하다. 본 논문은 가수 인식 작업을 통해 학습한 모델을 활용하여 다양한 음악적 특성을 반영하는 요소들 중 가수의 목소리 특성을 추출하는 방법을 제안한다. 배경음이 포함된 음원 역시 활용할 수 있지만, 음원에 포함된 배경음은 네트워크가 가수의 목소리를 온전하게 인식하는 것을 방해할 수 있다. 이를 해결하기 위해 본 연구에서는 음원 분리를 통해 배경음을 분리하는 사전 작업을 수행하고자 하며, SiSEC에 등장해 검증된 모델 구조를 활용하여 분리된 보컬로 이루어진 데이터 세트를 생성한다. 최종적으로 분리된 보컬을 활용하여 아티스트의 목소리를 반영하는 음색 기반 음악 특성을 발굴하고자 하며, 배경음이 분리되지 않은 음원을 활용한 기존 방법과의 비교를 통해 음원 분리의 효과를 알아보고자 한다.
https://doi.org/10.5909/JBE.2019.24.6.1024 인용 PDF KSCI KPUBS

대한민국 대표 여성 영화배우 4인의 음성적 특징 분석 (Analysis of Voice Signal Feature of the Korean Representative a Movie Actoress 4)

김봉현;이세환;가민경;조동욱
- 한국산학기술학회:학술대회논문집
- /
- 한국산학기술학회 2009년도 추계학술발표논문집
- /
- pp.723-726
- /
- 2009
영화산업은 삶의 질을 향상시키고 있는 현대 사회의 시대적 반영에 부응하는 문화산업으로 많은 관심을 받고 있다. 이러한 영화산업에서 흥행의 성공여부는 영화산업의 발전과 결부되는 부분이라 매우 중요하게 인식하고 있다. 흥행을 좌우하는 지표에는 다양한 요소들이 존재하며 그 중에서 주연배우의 특징과 영화의 본질이 맞을 때 성공적인 영화라 평가할 수 있는 일부분의 지표라 할 수 있다. 따라서 본 논문에서는 한국 영화에서 대표적인 흥행 마술사라 불리우고 있는 여성 영화배우 4인에 대한 음성분석을 통해 영화의 성공에 미치는 요소들과의 상호 연관성을 분석하였다. 이를 위해 대표적 여성 영화배우인 김혜수, 엄정화, 전도연, 문소리의 음성 분석을 통해 이들이 주연으로 출연한 영화의 흥행과 상관관계를 분석하였다.
PDF

EPON 액세스 망 기반의 ONU 설계 (Design of ONU for EPON Based Access Network)

김용태;신동범;이형섭
- 대한전자공학회:학술대회논문집
- /
- 대한전자공학회 2003년도 하계종합학술대회 논문집 I
- /
- pp.633-636
- /
- 2003
An Ethernet passive optical network(EPON) is a point-to-multipoint optical network. EPONs leverage the low cost, high performance curve of Ethernet systems to provide reliable data, voice and video to end user at bandwidths far exceeding current access technologies. In this paper, we propose the economical and flexible structure of optical network unit(ONU) converting optical format traffic to the customer's desired format(Ethernet, VDSL, T1, IP multicast, etc.). A unique feature of EPONs is that in addition to terminating and converting the optical signal the ONU provide Layer 2-3 switching functionality, which allows internal routing of enterprise traffic at the ONU.
PDF

청각장애자용 발음훈련기기 개발에 관한 연구 (A study on speech training aids for Deafs)

안상필;이재혁;윤태성;박상희
- 대한전기학회:학술대회논문집
- /
- 대한전기학회 1990년도 하계학술대회 논문집
- /
- pp.47-50
- /
- 1990
Deafs cannot speak straight voice as normal people in lack of feedback of their pronunciation, therefore speech training is required. In this study, fundamental frequency, intensity, formant frequencies, vocal tract graphic and vocal tract area function, extracted from speech signal, are used as feature parameter. AR model, whose coefficients are extracted using inverse filtering. is used as speech generation model. In connect ion between vocal tract graphic and speech parameter, articulation distances and articulation distance functions in selected 15-intervals are determined by extracted vocal tract areas and formant frequencies.
PDF

Audio and Video Bimodal Emotion Recognition in Social Networks Based on Improved AlexNet Network and Attention Mechanism

Liu, Min;Tang, Jun
- Journal of Information Processing Systems
- /
- 제17권4호
- /
- pp.754-771
- /
- 2021
In the task of continuous dimension emotion recognition, the parts that highlight the emotional expression are not the same in each mode, and the influences of different modes on the emotional state is also different. Therefore, this paper studies the fusion of the two most important modes in emotional recognition (voice and visual expression), and proposes a two-mode dual-modal emotion recognition method combined with the attention mechanism of the improved AlexNet network. After a simple preprocessing of the audio signal and the video signal, respectively, the first step is to use the prior knowledge to realize the extraction of audio characteristics. Then, facial expression features are extracted by the improved AlexNet network. Finally, the multimodal attention mechanism is used to fuse facial expression features and audio features, and the improved loss function is used to optimize the modal missing problem, so as to improve the robustness of the model and the performance of emotion recognition. The experimental results show that the concordance coefficient of the proposed model in the two dimensions of arousal and valence (concordance correlation coefficient) were 0.729 and 0.718, respectively, which are superior to several comparative algorithms.
https://doi.org/10.3745/JIPS.02.0161 인용 PDF KSCI

딥러닝 기반 음향 신호 대역 확장 시스템 (Deep Learning based Raw Audio Signal Bandwidth Extension System)

김윤수;석종원
- 전기전자학회논문지
- /
- 제24권4호
- /
- pp.1122-1128
- /
- 2020
대역 확장(Bandwidth Extension)이란 채널 용량 부족 혹은 이동통신 기기에 탑재된 코덱의 특성으로 인해 부호화 및 복호화 과정에서 대역 제한(band limited)되거나 손상된 협대역 신호(NB, Narrow Band)를 복원, 확장하여 광대역 신호(WB, Wide Band)로 전환 시켜주는 것을 의미한다. 대역 확장 연구는 주로 음성 신호 위주로 대역 복제(SBR, Spectral Band Replication), IGF(Intelligent Gap Filling)과 같이 고대역을 주파수 영역으로 변환하여 복잡한 특징 추출 과정을 거쳐 이를 바탕으로 사라지거나 손상된 고대역을 복원한다. 본 논문에서는 딥러닝 모델 중 오토인코더(Autoencoder)를 바탕으로 1차원 합성곱 신경망(CNN, Convolutional Neural Network)들의 잔차 연결을 활용하여 복잡한 사전 전처리 과정 없이 일정한 길이의 시간 영역 신호를 입력시켜 대역 확장 시킨 음향 신호를 출력하는 모델을 제안한다. 또한 음성 영역에 제한되지 않는 음악을 포함한 여러 종류의 음원을 포함하는 데이터셋에 훈련시켜도 손상된 고대역을 복원할 수 있음을 확인하였다.
https://doi.org/10.7471/ikeee.2020.24.4.1122 인용 PDF KSCI

검색결과 52건 처리시간 0.022초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)