• Title/Summary/Keyword: Speech signal processing

Search Result 331, Processing Time 0.022 seconds

Gender Analysis in Elderly Speech Signal Processing (노인음성신호처리에서의 젠더 분석)

  • Lee, JiYeoun
    • Journal of Digital Convergence
    • /
    • v.16 no.10
    • /
    • pp.351-356
    • /
    • 2018
  • Changes in vocal cords due to aging can change the frequency of speech, and the speech signals of the elderly can be automatically distinguished from normal speech signals through various analyzes. The purpose of this study is to provide a tool that can be easily accessed by the elderly and disabled people who can be excluded from the rapidly changing technological society and to improve the voice recognition performance. In the study, the gender of the subjects was reported as sex analysis, and the number of female and male voice samples was used equally. In addition, the gender analysis was applied to set the voices of the elderly without using voices of all ages. Finally, we applied a review methodology of standards and reference models to reduce gender difference. 10 Korean women and 10 men aged 70 to 80 years old are used in this study. Comparing the F0 value extracted directly with the waveform and the F0 extracted with TF32 and the Wavesufer speech analysis program, Wavesufer analyzed the F0 of the elderly voice better than TF32. However, there is a need for a voice analysis program for elderly people. In conclusions, analyzing the voice of the elderly will improve speech recognition and synthesis capabilities of existing smart medical systems.

Automated Classification of Audio Genre using Sequential Forward Selection Method

  • Lee Jong Hak;Yoon Won lung;Lee Kang Kyu;Park Kyu Sik
    • Proceedings of the IEEK Conference
    • /
    • 2004.08c
    • /
    • pp.768-771
    • /
    • 2004
  • In this paper, we propose a content-based audio genre classification algorithm that automatically classifies the query audio into five genres such as Classic, Hiphop, Jazz, Rock, Speech using digital signal processing approach. From the 20 second query audio file, 54 dimensional feature vectors, including Spectral Centroid, Rolloff, Flux, LPC, MFCC, is extracted from each query audio. For the classification algorithm, k-NN, Gaussian, GMM classifier is used. In order to choose optimum features from the 54 dimension feature vectors, SFS (Sequential Forward Selection) method is applied to draw 10 dimension optimum features and these are used for the genre classification algorithm. From the experimental result, we verify the superior performance of the SFS method that provides near $90{\%}$ success rate for the genre classification which means $10{\%}$-$20{\%}$ improvements over the previous methods

  • PDF

Speech Signal Processing for Performance Improvement of Text-Based Video Segmentation (문자정보 기반 비디오 분할에서 성능 향상을 위한 음성신호처리)

  • 이용주;손종목;강경옥;배건성
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 1999.11b
    • /
    • pp.187-191
    • /
    • 1999
  • 비디오 프로그램에서 영상 내에 포함되어 있는 문자정보는 동영상의 내용 검색 및 색인을 위한 비디오 분할에 사용될 수 있다. 일반적으로 장면 내에 포함되어 있는 문자들은 해상도가 낮고 글자 크기와 형태가 다양하기 때문에 추출과 인식이 어려울 뿐만 아니라 의도하지 않은 배경화면의 문자인 경우도 많기 때문에 내용기반 검색에는 사용되기가 어렵다. 그러나 비디오 내에 포함된 문자정보가 나타나는 시작 프레임과 끝나는 프레임을 검출하여 비디오 프로그램을 분할함으로써 내용기반요약정보를 만들 수 있으며, 동영상의 내용 검색 및 색인에 사용할 수 있다. 일반적으로 문자정보의 추출에 의해서 비디오를 분할할 때 음성정보는 전혀 고려되지 않으므로 분할된 비디오 정보를 재생할 경우음성신호가 단어 또는 어절/음절의 임의의 점에서 시작되고 끝나게 되어 듣기에 부자연스럽게 된다 따라서 본 논문에서는 뉴스방송의 비디오 프로그램에서 문자정보가 포함되어 는 비디오의 시작 프레임과 끝 프레임을 중심으로 그에 대응되는 구간의 음성신호를 검출한 후 이를 적절히 처리하여 분할 된 비디오를 재생할 때 음성신호가 보다 자연스럽게 들릴 수 있도록 하는 방법에 대해 연구하였다.

  • PDF

Frequency Domain Blind Source Seperation Using Cross-Correlation of Input Signals (입력신호 상호상관을 이용한 주파수 영역 블라인드 음원 분리)

  • Sung Chang Sook;Park Jang Sik;Son Kyung Sik;Park Keun-Soo
    • Journal of Korea Multimedia Society
    • /
    • v.8 no.3
    • /
    • pp.328-335
    • /
    • 2005
  • This paper proposes a frequency domain independent component analysis (ICA) algorithm to separate the mixed speech signals using a multiple microphone array By estimating the delay timings using a input cross-correlation, even in the delayed mixture case, we propose a good initial value setting method which leads to optimal convergence. To reduce the calculation, separation process is performed at frequency domain. The results of simulations confirms the better performances of the proposed algorithm.

  • PDF

A Study on the Speech Signal Processing for Cochlear Implant using the PLP Analysis (청각보철을 위한 PLP방식의 음성신호처리에 관한 연구)

  • Kim, Young-Sun;Choi, Doo-Il;Park, Sang-Hui;Beack, Seung-Hwa
    • Proceedings of the KOSOMBE Conference
    • /
    • v.1992 no.05
    • /
    • pp.167-170
    • /
    • 1992
  • 본 논문에서는 감각성 난청자들이 정상인들과 유사한 음성 인식을 하도록 청각 보철 기기를 구성하였다. 음성의 포먼트를 추출하기 위해서는 PLP(Perceptual Linear Prediction) 방식을 이용하였으며, pitch 추출을 위해서는 3 단계 클리핑 함수를 이용한 자기 상관법을 이용하였다. 또한 다중 채널 - 다중 전극 방식을 이용하여 내이의 헤어셀에 17 개의 전극을 삽입하여 신호를 가하는 시뮬레이션을 하였다. 실험에 사용한 데이타는 모음 /a/, /e/, /i/, /o/, /u/로 전모음과 후모음의 차이를 구별하였으며 두번째 포먼트의 변화와 포먼트 통합 이론에 대한 검증을 하였다.

  • PDF

Performance evaluation of a multiple-access algorithm for PCN in microcell environment (마이크로셀 환경에서 개인휴대통신을 위한 다중접속 알고리즘의 성능 평가)

  • 전영희;이재형;최형진
    • Journal of the Korean Institute of Telematics and Electronics A
    • /
    • v.33A no.7
    • /
    • pp.55-63
    • /
    • 1996
  • In this paper, a multiple-access algorithm for PCN is proposed. The proposed algorithm provides th integrated service of information soruces and can be operated stably in high load state. Given bandwidth is efficiently used for it in the microcell environment. And system performance can be improved through the statistical-multiplexing technque. In order to process the speech signal usually requiring real-tiem processing, we adopt a random access of AlOHA type for th ebasic protocol sturcture and assume the form of ALOHA-reservation. We have analyzed the performance of the proposed algorithm through system throughput and packet delay in the microcell environment.

  • PDF

Development of a Cryptographic Dongle for Secure Voice Encryption over GSM Voice Channel

  • Kim, Tae-Yong;Jang, Won-Tae;Lee, Hoon-Jae
    • Journal of information and communication convergence engineering
    • /
    • v.7 no.4
    • /
    • pp.561-564
    • /
    • 2009
  • A cryptographic dongle, which is capable of transmitting encrypted voice signals over the CDMA/GSM voice channel, was designed and implemented. The dongle used PIC microcontroller for signals processing including analog to digital conversion and digital to analog conversion, encryption and communicating with the smart phone. A smart phone was used to provide power to the dongle as well as passing the encrypted speech to the smart phone which then transmits the signal to the network. A number of tests were conducted to check the efficiency of the dongle, the firmware programming, the encryption algorithms, and the secret key management system, the interface between the smart phone and the dongle and the noise level.

Sensibility Classification Algorithm of EEGs using Multi-template Method (다중 템플릿 방법을 이용한 뇌파의 감성 분류 알고리즘)

  • Kim Dong-Jun
    • The Transactions of the Korean Institute of Electrical Engineers D
    • /
    • v.53 no.12
    • /
    • pp.834-838
    • /
    • 2004
  • This paper proposes an algorithm for EEG pattern classification using the Multi-template method, which is a kind of speaker adaptation method for speech signal processing. 10-channel EEG signals are collected in various environments. The linear prediction coefficients of the EEGs are extracted as the feature parameter of human sensibility. The human sensibility classification algorithm is developed using neural networks. Using EEGs of comfortable or uncomfortable seats, the proposed algorithm showed about 75% of classification performance in subject-independent test. In the tests using EEG signals according to room temperature and humidity variations, the proposed algorithm showed good performance in tracking of pleasantness changes and the subject-independent tests produced similar performances with subject-dependent ones.

Noise Processing for Speech Recognition in the Telephone Line (음성 인식을 위한 전화망에서의 잡음처리)

  • 전원석;신원호;양태영;김원구;윤대희
    • The Journal of the Acoustical Society of Korea
    • /
    • v.17 no.1
    • /
    • pp.4-8
    • /
    • 1998
  • 본 논문에서는 다양한 전화선 채널을 통하여 수집된 음성 데이터에 포함된 잡음 및 채널 왜곡을 제거하여 음성인식 시스템의 성능을 향상시키는 방법에 관하여 연구하였다. 전 화선을 통과한 음성에 포함된 채널 잡음 및 왜곡을 제거하는 방법으로는 음성신호를 보상하 는 방법으로 CMS(Cepstral Mean Subtraction), SBR(Signal Bias Removal)과 SM(Stochastic Matching)의 성능을 비교 평가하였다. 잡음제거 방식의 성능을 평가를 위하 여 음소 단위의 반연속 HMM을 이용한 화자독립 단독음 인식을 수행하였다. 인식 실험 결 과, 멜 켑스트럼을 사용한 경우에 CMS가 가장 우수한 성능을 내었고 다음으로 SM과 SBR 순으로 나타났다. 또한 특징벡터를 주변 잡음에 강인하게 하는 가중함수(RPS, BPL)를 사용 한 켑스트럼 계수와 잡음제거 방식을 함께 사용한 경우에 인식 성능이 더욱 향상되었다.

  • PDF

A Study on the Efficient Speech Recognition System using Database Grouping (어휘 그룹화를 이용한 음성인식시스템의 성능향상에 관한 연구)

  • 우상욱;권승호;한수양;이동규;이두수
    • Proceedings of the IEEK Conference
    • /
    • 2003.07e
    • /
    • pp.2455-2458
    • /
    • 2003
  • In this paper, the Classification of Energy Labeling has been Proposed. Energy Parameters of input signal which is extracted from each phoneme is labelled. And groups of labelling according to detected energies of input signals are detected. Next, DTW processes in a selected group of labeling. This leads to DTW processing faster than a previous algorithm. In this Method, because an accurate detection of parameters is necessary on the assumption in steps of a detection of speeching duration and a detection of energy parameters, variable windows which are decided by pitch period is used. Extract algorithms don't search for exact frame energy, because 256 frame window-sizes is fixed. For this reason, a new energy extraction method has been proposed. A pitch period is detected firstly; next window scale is decided between 200 frames and 300 frames. The proposed method make it possible to cancel an influence of windows.

  • PDF