• Title/Summary/Keyword: voice parameter

Search Result 179, Processing Time 0.028 seconds

Extraction and Analysis of Voice Feature Parameter of Chungbuk News Announcers (충북방송 뉴스 진행자의 음성적 특징 추출 및 분석)

  • Kim, Bong-Hyun;Lee, Se-Hwan;Ka, Min-Kyoung;Cho, Dong-Uk;J.Bae, Young-Lae
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2009.11a
    • /
    • pp.363-364
    • /
    • 2009
  • 방송 산업이 기술적 구조적으로 발전하고 시청자의 수준 향상 및 문화 산업이 급변함에 따라 현대사회에서 방송 분야는 거대 성장을 거듭하고 있다. 이러한 방송 산업의 시대적 변화속에서 지속적으로 관심의 대상이 되고 있는 것이 시청자들의 수준 및 변화의 초점이며 이를 파악하여 원활한 방송의 진행을 주도해야 하는 것이 방송 진행자의 역할이다. 따라서 본 논문에서는 충북지역의 방송 3사에서 뉴스를 담당하고 있는 진행자에 대한 음성을 수집하여 다양한 음성 분석 요소들을 적용하고 이에 따른 결과값을 기반으로 방송 진행자의 음성에 대한 특징적 정보를 추출하는 실험을 수행하였다. 특히, 음성을 통해 전달할 수 있는 영향력을 분석하기 위해 피치, 지터, 짐머, 안정도, 및 스펙트로그램 등의 다양한 음성 분석 요소를 적용하였으며 결과값에 대한 비교, 분석을 수행하였다.

Change Analysis of Heart Related Voice Analysis Parameter Based on Auricular Acupuncture (이침요법(耳針療法)을 기반으로 한 심장 관련 음성 분석 요소의 변화 분석)

  • Kim, Bong-Hyun;Lim, Soon-Yong;Lim, Sung-Su;Yoo, Hwang-Jun;Yeon, Yong-Heum;Min, Ji-Seon;Han, Sang-Hyo;Ka, Min-Kyoung;Cho, Dong-Uk
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2011.11a
    • /
    • pp.1043-1046
    • /
    • 2011
  • 건강에 대한 예방과 관리를 반영한 것이 대체의학이다. 대체의학 중에 이침(耳針)요법은 부작용이 적은 방법으로 널리 사용되고 있다. 이침요법은 간단한 교육과정을 거친 후 자가 진단을 통해 응급처치가 가능한 것으로 실생활에서 손쉽게 이용되고 있다. 따라서 본 논문에서는 심장에 해당하는 이(耳)혈 상응점을 자극하여 심장과 관련된 음성 요소의 변화를 측정하였다. 이를 위해 심장에 해당하는 이(耳)혈 상응점을 자극하기 전과 후의 음성을 수집하여 음성 분석 요소 중 Jitter와 2Formant Frequency Bandswidth을 적용하여 단위 시간안의 발음에서 성대 진동의 변화율과 공명강의 변화를 통해 심장과 음성의 상관성을 분석하는 연구를 수행하였다.

Knowledge-driven speech features for detection of Korean-speaking children with autism spectrum disorder

  • Seonwoo Lee;Eun Jung Yeo;Sunhee Kim;Minhwa Chung
    • Phonetics and Speech Sciences
    • /
    • v.15 no.2
    • /
    • pp.53-59
    • /
    • 2023
  • Detection of children with autism spectrum disorder (ASD) based on speech has relied on predefined feature sets due to their ease of use and the capabilities of speech analysis. However, clinical impressions may not be adequately captured due to the broad range and the large number of features included. This paper demonstrates that the knowledge-driven speech features (KDSFs) specifically tailored to the speech traits of ASD are more effective and efficient for detecting speech of ASD children from that of children with typical development (TD) than a predefined feature set, extended Geneva Minimalistic Acoustic Standard Parameter Set (eGeMAPS). The KDSFs encompass various speech characteristics related to frequency, voice quality, speech rate, and spectral features, that have been identified as corresponding to certain of their distinctive attributes of them. The speech dataset used for the experiments consists of 63 ASD children and 9 TD children. To alleviate the imbalance in the number of training utterances, a data augmentation technique was applied to TD children's utterances. The support vector machine (SVM) classifier trained with the KDSFs achieved an accuracy of 91.25%, surpassing the 88.08% obtained using the predefined set. This result underscores the importance of incorporating domain knowledge in the development of speech technologies for individuals with disorders.

Acoustic characteristics of speech-language pathologists related to their subjective vocal fatigue (언어재활사의 주관적 음성피로도와 관련된 음향적 특성)

  • Jeon, Hyewon;Kim, Jiyoun;Seong, Cheoljae
    • Phonetics and Speech Sciences
    • /
    • v.14 no.3
    • /
    • pp.87-101
    • /
    • 2022
  • In addition to administering a questionnaire (J-survey), which questions individuals on subjective vocal fatigue, voice samples were collected before and after speech-language pathology sessions from 50 female speech-language pathologists in their 20s and 30s in the Daejeon and Chungnam areas. We identified significant differences in Korean Vocal Fatigue Index scores between the fatigue and non-fatigue groups, with the most prominent differences in sections one and two. Regarding acoustic phonetic characteristics, both groups showed a pattern in which low-frequency band energy was relatively low, and high-frequency band energy was increased after the treatment sessions. This trend was well reflected in the low-to-high ratio of vowels, slope LTAS, energy in the third formant, and energy in the 4,000-8,000 Hz range. A difference between the groups was observed only in the vowel energy of the low-frequency band (0-4,000 Hz) before treatment, with the non-fatigue group having a higher value than the fatigue group. This characteristic could be interpreted as a result of voice abuse and higher muscle tonus caused by long-term voice work. The perturbation parameter and shimmer local was lowered in the non-fatigue group after treatment, and the noise-to-harmonics ratio (NHR) was lowered in both groups following treatment. The decrease in NHR and the fall of shimmer local could be attributed to vocal cord hypertension, but it could be concluded that the effective voice use of speech-language pathologists also contributed to this effect, especially in the non-fatigue group. In the case of the non-fatigue group, the rhamonics-to-noise ratio increased significantly after treatment, indicating that the harmonic structure was more stable after treatment.

Design and Implementation of VoIP Equipment including Telephone Function for Home Gateway Connection (전화기 기능을 포함한 홈 게이트웨이 접속용 VOIP 장비 설계 및 구현)

  • Lee Yong-Soo;Jung Kwang-Wook;Chung Joong-Soo
    • The Journal of the Korea Contents Association
    • /
    • v.4 no.4
    • /
    • pp.123-131
    • /
    • 2004
  • Internet is absolutely contributed to information technology revolution nowadays. Internet services such as voice and data, etc. are provided home or small office via home gateway. The development of communication equipment via home gateway is implemented rapidly, and its product various. This paper presents the design and implementation of the VoIP equipment including the telephone function based on the embedded environment and being connected to the home gateway and the PC because of taking 2-ethernet LAN ports. As developing environment, the STLC1502 developed at ST Microelectronics as single chip solution, VxWorks as RTOS, and C language as coding mechanism are used. The verification of the developed systems for the voice test is carried out for the gatekeeper via Internet. The performance parameter is considered as the call processing capacity measuring the time of the call setup and clearing, and the data processing capacity for the file transfer. As a call setup and clearing is about 95ms, the call processing capacity is about 10 calls per second. The data processing capacity is 5.7Mbps in case of file transfer of server and client environment. Therefore the performance result is satisfied in the aspect of the call processing time and the data transfer time in Internet.

  • PDF

A Fault Tolerant ATM Switch using a Fully Adaptive Self-routing Algorithm - The Cyclic Banyan Network (실내 무선 통신로에서 파일럿 심볼을 삽입한 Concatenated FEC 부호에 의한 WATM의 성능 개선)

  • 박기식;강영흥;김종원;정해원;양해권;조성준
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.24 no.9A
    • /
    • pp.1276-1284
    • /
    • 1999
  • We have evaluated the BER's and CLP's of Wireless ATM (WATM) cells employing the concatenated FEC code with pilot symbols for fading compensation through the simulation in indoor wireless channel modeled as a Rayleigh and a Rician fading channel, respectively. The results of the performance evaluation are compared with those obtained by employing the convolutional code in the same condition. In Rayleigh fading channel, considering the maximum tolerance BER ( $10^-3$) as a criterion of the voice service, it is blown that the performance improvement of about 4 dB is obtained in terms of $E_b/N_o$ by employing the concatenated FEC code with pilot symbols rather than the convolutional code with pilot symbols.When the values of K parameter which means the ratio of the direct signal to scattered signal power in Rician fading channel are 6 and 10, it is shown that the performance improvement of about 4 dB and 2 dB is obtained, respectively, in terms of $E_b/N_o$ by employing the concatenated FEC code with pilot symbols considering the maximum tolerance BER of the voice service. Also in Rician fading channel of K=6 and K= 10, considering CLP = $10^-3$ as a criterion, it is observed that the performance improvement of about 3.5 dB and1.5 dB is obtained, respectively, in terms of $E_b/N_o$ by employing the concatenated FEC code with pilot symbols.

  • PDF

Realization of an IEEE 802.11g VoWLAN Terminal with Support of Adaptable Power Save and QoS During a Call (통화 중 적응적 Power Save와 QoS 지원이 가능한 IEEE B02.11g VoWLAN 단말기 구현)

  • Kwon, Sung-Su;Lee, Jong-Chul
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.31 no.10A
    • /
    • pp.1003-1013
    • /
    • 2006
  • There is a serious problem in an 802.11g VoWLAN (Voice over Wireless LAN) terminal that talk time is less than 30% compared with an 802.11b terminal. It is almost impossible to achieve talk time level of the 802.11b MAC transmission method because IEEE 802.11g uses OFDM modulation, which is a kind of multi-carrier method and OFDM transmission speed is 54 Mbps faster than normal modulation. In this paper, a new concept of a Holdover time as a power saving method during a call with 802.11g terminal is suggested for the first time. Increase in the number of engaged terminals as a result of holdover time causes to QoS problem because of the increase in the number of back-off and then contention window. In this paper, to solve the QoS problem, a new approach is suggested such that when in down lint the sequence number of 802.11 G.711 is analyzed in the MAC of the terminal and then the Hold over time depending on loss rate is changed. Also, consumption of an electric current of 802.11b/g and MAC parameter's performance due to busy traffic caused by increase in the number of terminal are analyzed and then real data using VQT and Airopeek are analyzed.

A study on the lip shape recognition algorithm using 3-D Model (3차원 모델을 이용한 입모양 인식 알고리즘에 관한 연구)

  • 배철수
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.3 no.1
    • /
    • pp.59-68
    • /
    • 1999
  • Recently, research and developmental direction of communication system is concurrent adopting voice data and face image in speaking to provide more higher recognition rate then in the case of only voice data. Therefore, we present a method of lipreading in speech image sequence by using the 3-D facial shape model. The method use a feature information of the face image such as the opening-level of lip, the movement of jaw, and the projection height of lip. At first, we adjust the 3-D face model to speeching face image sequence. Then, to get a feature information we compute variance quantity from adjusted 3-D shape model of image sequence and use the variance quality of the adjusted 3-D model as recognition parameters. We use the intensity inclination values which obtaining from the variance in 3-D feature points as the separation of recognition units from the sequential image. After then, we use discrete HMM algorithm at recognition process, depending on multiple observation sequence which considers the variance of 3-D feature point fully. As a result of recognition experiment with the 8 Korean vowels and 2 Korean consonants, we have about 80% of recognition rate for the plosives and vowels. We propose that usability with visual distinguishing factor that using feature vector because as a result of recognition experiment for recognition parameter with the 10 korean vowels, obtaining high recognition rate.

  • PDF

DNN based Robust Speech Feature Extraction and Signal Noise Removal Method Using Improved Average Prediction LMS Filter for Speech Recognition (음성 인식을 위한 개선된 평균 예측 LMS 필터를 이용한 DNN 기반의 강인한 음성 특징 추출 및 신호 잡음 제거 기법)

  • Oh, SangYeob
    • Journal of Convergence for Information Technology
    • /
    • v.11 no.6
    • /
    • pp.1-6
    • /
    • 2021
  • In the field of speech recognition, as the DNN is applied, the use of speech recognition is increasing, but the amount of calculation for parallel training needs to be larger than that of the conventional GMM, and if the amount of data is small, overfitting occurs. To solve this problem, we propose an efficient method for robust voice feature extraction and voice signal noise removal even when the amount of data is small. Speech feature extraction efficiently extracts speech energy by applying the difference in frame energy for speech and the zero-crossing ratio and level-crossing ratio that are affected by the speech signal. In addition, in order to remove noise, the noise of the speech signal is removed by removing the noise of the speech signal with an average predictive improved LMS filter with little loss of speech information while maintaining the intrinsic characteristics of speech in detection of the speech signal. The improved LMS filter uses a method of processing noise on the input speech signal by adjusting the active parameter threshold for the input signal. As a result of comparing the method proposed in this paper with the conventional frame energy method, it was confirmed that the error rate at the start point of speech is 7% and the error rate at the end point is improved by 11%.

Speech/Music Signal Classification Based on Spectrum Flux and MFCC For Audio Coder (오디오 부호화기를 위한 스펙트럼 변화 및 MFCC 기반 음성/음악 신호 분류)

  • Sangkil Lee;In-Sung Lee
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.16 no.5
    • /
    • pp.239-246
    • /
    • 2023
  • In this paper, we propose an open-loop algorithm to classify speech and music signals using the spectral flux parameters and Mel Frequency Cepstral Coefficients(MFCC) parameters for the audio coder. To increase responsiveness, the MFCC was used as a short-term feature parameter and spectral fluxes were used as a long-term feature parameters to improve accuracy. The overall voice/music signal classification decision is made by combining the short-term classification method and the long-term classification method. The Gaussian Mixed Model (GMM) was used for pattern recognition and the optimal GMM parameters were extracted using the Expectation Maximization (EM) algorithm. The proposed long-term and short-term combined speech/music signal classification method showed an average classification error rate of 1.5% on various audio sound sources, and improved the classification error rate by 0.9% compared to the short-term single classification method and 0.6% compared to the long-term single classification method. The proposed speech/music signal classification method was able to improve the classification error rate performance by 9.1% in percussion music signals with attacks and 5.8% in voice signals compared to the Unified Speech Audio Coding (USAC) audio classification method.