Search | Korea Science

Performance of GMM and ANN as a Classifier for Pathological Voice

Wang, Jianglin;Jo, Cheol-Woo
- Speech Sciences
- /
- v.14 no.1
- /
- pp.151-162
- /
- 2007
This study focuses on the classification of pathological voice using GMM (Gaussian Mixture Model) and compares the results to the previous work which was done by ANN (Artificial Neural Network). Speech data from normal people and patients were collected, then diagnosed and classified into two different categories. Six characteristic parameters (Jitter, Shimmer, NHR, SPI, APQ and RAP) were chosen. Then the classification method based on the artificial neural network and Gaussian mixture method was employed to discriminate the data into normal and pathological speech. The GMM method attained 98.4% average correct classification rate with training data and 95.2% average correct classification rate with test data. The different mixture number (3 to 15) of GMM was used in order to obtain an optimal condition for classification. We also compared the average classification rate based on GMM, ANN and HMM. The proper number of mixtures on Gaussian model needs to be investigated in our future work.
PDF

Real-Time Arbitrary Face Swapping System For Video Influencers Utilizing Arbitrary Generated Face Image Selection

Jihyeon Lee;Seunghoo Lee;Hongju Nam;Suk-Ho Lee
- International Journal of Internet, Broadcasting and Communication
- /
- v.15 no.2
- /
- pp.31-38
- /
- 2023
This paper introduces a real-time face swapping system that enables video influencers to swap their faces with arbitrary generated face images of their choice. The system is implemented as a Django-based server that uses a REST request to communicate with the generative model,specifically the pretrained stable diffusion model. Once generated, the generated image is displayed on the front page so that the influencer can decide whether to use the generated face or not, by clicking on the accept button on the front page. If they choose to use it, both their face and the generated face are sent to the landmark extraction module to extract the landmarks, which are then used to swap the faces. To minimize the fluctuation of landmarks over time that can cause instability or jitter in the output, a temporal filtering step is added. Furthermore, to increase the processing speed the system works on a reduced set of the extracted landmarks.
https://doi.org/10.7236/IJIBC.2023.15.2.31 인용 PDF

Nonlinear Speech Production Modeling using Nonlinear Autoregressive Exogenous based on Support Vector Machine (서포트 벡터 머신 기반 비선형 외인성 자귀회귀를 이용한 비선형 조음 모델링)

Jang, Seung-Jin;Kim, Hyo-Min;Park, Young-Choel;Choi, Hong-Shik;Yoon, Young Ro
- Proceedings of the Korea Information Processing Society Conference
- /
- 2007.11a
- /
- pp.113-116
- /
- 2007
In this paper, our proposed Nonlinear Autoregressive Exogenous (NARX) based on Least Square-Support Vector Regression (LS-SVR) is introduced and tested for producing natural sounds. This nonlinear synthesizer perfectly reproduce voiced sounds, and also conserve the naturalness such as jitter and shimmer, compared to LPC does not keep these naturalness. However, the results of some phonation are quite different from the original sounds. These results are assumed that single-band model can not afford to control and decompose the high frequency components. Therefore multi-band model with wavelet filterbank is adopted for substituting single band model. As a results, multi-band model results in improved stability. Finally, nonlinear speech modeling using NARX based on LS-SVR can successfully reconstruct synthesized sounds nearly similar to original voiced sounds.
https://doi.org/10.3745/PKIPS.y2007m11a.113 인용 PDF

Evaluations of Three Phase Shift Models in Describing Phase Shift Impulse Train Response of a Simple Planar Oscillator (간단한 2차원 오실레이터의 임펄스열 응답에 관한 3가지 위상편이 모델의 평가)

Jeon, Man-Young
- The Journal of the Korea institute of electronic communication sciences
- /
- v.9 no.8
- /
- pp.861-866
- /
- 2014
This study evaluates the modeling accuracy of the existing three phase shift models on which the time domain oscillator phase noise theories are based. For the evaluation, this study investigates how accurately the three models can model the phase shift impulse train response of a simple planar oscillator. Evaluation result reveals that Kaertner model most accurately reflects the oscillator's phase shift impulse train responses for five different impulse train inputs, whereas PP model exhibited the worst performance in modeling the phase shift impulse train responses.
https://doi.org/10.13067/JKIECS.2014.9.8.861 인용 PDF KSCI

Dynamic Timed Multimedia Synchronization Model for Efficient Quality of Service (효율적인 서비스 품질을 위한 동적 시간형 멀티미디어 동기화 모델)

이근왕;오해석
- Journal of the Korean Institute of Telematics and Electronics C
- /
- v.36C no.10
- /
- pp.75-80
- /
- 1999
Multimedia synchronization model for distributed, continuous or discrete media that was guaranteed high quality of service is requited in developing multimedia application software. In this paper we have specific object controller which is called dynamic key media that is changed by user event generation. This becomes media whose event occurrence and periods can't be predicted. For event occurrence not only audio but also text and image can be chosen for key media and performs its role. Object controller transfers information for next transition. The proposed model offers high qualify of services by permitting maximum allowed jitter and skew in playout time and verified its effectiveness by simulation.
PDF

Characteristics of Korean Stop Consonants by Using Electroglottography and Its Clinical Application (Electroglottography를 사용한 한국어 폐쇄자음의 특성 및 임상적 적용)

Chae, Y.J.;Kim, H.G.;Hong, K.H.
- Speech Sciences
- /
- v.4 no.2
- /
- pp.157-177
- /
- 1998
An electroglottography (EGG) was used to investigate the function of the vocal folds during their vibration. In this study, four Korean native speakers and 10 vocal polyp patients were selected. To investigate the dynamic change of EGG waveforms for the three-way distinction of Korean stops, a DSP-Sona graph model 5500, a Rino- Laryngeal stroboscope, a CSL model 4300B and a Laryngograph were used. An EGG Model 4338 was used to exam the vocal polyp of patients' voices during high, low, comfortable pitch production. The purpose of this study is to investigate the characteristics of Korean stop consonants in relation to pitch and to observe laryngeal movement during vocal fold vibration and speech production. The basic data accumulated during this research can be applied in clinical treatment. The results are as follows: on the Korean stop consonants, the aspirated stop is the highest in the GOT and PC1. On the angle of vowel contour, the angle of lenis is smaller than the angle of heavily aspirated and glottalized stops. The fundamental frequency is lowest at the lenis stop, In vocal polyp patients', the low pitch range is smaller than in normal speakers'. The pitch break and the vocal fry were observed. The jitter and OQ value are higher in vocal polyp patients than in those of normal speakers'.
PDF

Adaptive OLSR Protocol Based on Average Node Distance in Airdropped Distributed Mobility Model (분산 낙하 이동 모델에서의 평균 노드 거리 기반 적응적 OLSR 프로토콜)

Lee, Taekmin;Lee, Jinhae;Wang, Jihyeun;Yoo, Joonhyuk;Yoo, Seong-eun
- IEMEK Journal of Embedded Systems and Applications
- /
- v.13 no.2
- /
- pp.83-91
- /
- 2018
With the development of IT (Information Technology) technology, embedded system and network technology are combined and used in various environments such as military environment as well as everyday life. In this paper, we propose a new airdropped distributed mobility model (ADMM) modeling the dispersion falling of the direct shot of a cluster bomb, and we compare and analyze some representative MANET routing protocols in ADMM in ns-3 simulator. As a result of the analysis, we show OLSR routing protocol is promising in ADMM environment in the view points of packet delivery ratio (PDR), end to end delay, and jitter. In addition, we propose a new adaptation scheme for OLSR, AND-OLSR (Average Node Distance based adaptive-OLSR) to improve the original OLSR in ADMM environment. The new protocol calculates the average node distance, adapts the period of the control message based on the average node distance increasing rate. Through the simulation study, we show that the proposed AND-OLSR outperforms the original OLSR in PDR and control message overhead.
https://doi.org/10.14372/IEMEK.2018.13.2.83 인용 PDF KSCI

A Study on the Design an Implementation Method of Computational Object Supporting CM Stream Interface in the Distributed Environment (분산 환경에서 CM 스트림 인터페이스를 지원하는 계산 객체의 설계 및 구현 방안 연구)

Song, Byeong-Gwon;Jin, Myeong-Suk;Kim, Geon-Ung
- The Transactions of the Korea Information Processing Society
- /
- v.7 no.6
- /
- pp.1785-1794
- /
- 2000
This paper presents a computational object model supporting CM(Continuous Media) stream interfaces including QoS(Quality of Service) required in the distributed application method for the proposed stream interface including QoS. A stream interface consists of a data channel and a control channel. In this paper, the CORBA supporting communication channel is used as the control channel and various transport protocols can be used as the dta channel of the stream interface. Also, specifications of the application QoS are included in stream interface specification. In implementation, FIFO queues and timers are used to support transmission rate, delay and jitter control mechanisms of he stream interface.
PDF

Torque and Force Measurement of a Prototype HAU Reaction Wheel and the Effect of Disturbance on the Attitude Stability of Spacecraft

Oh, Hwa-Suk;Kwon, Jae-Wook;Lee, Hyunwoo;Nam, Myung-Ryong;Park, Dong-Jo
- Journal of Mechanical Science and Technology
- /
- v.15 no.6
- /
- pp.743-751
- /
- 2001
A Prototype reaction wheel, named the Hankuk Aviation University (HAU) reaction wheel, has been developed for KAISTSAT satellite series. Torque and force disturbances are inherent in reaction wheels, and thus the force and torque characteristics should be examined for every newly developed reaction wheel. The torque and force disturbance noises in the prototype HAU reaction wheel are measured with a torque-measuring table developed for the present study. A new measuring procedure based on a simple principle is applied for the measurements. The frequency characteristics of the torque and force noises are analyzed by examining the power spectral density. The effect of the torque noise on the attitude stability is also examined through numerical simulations with a single-axis attitude model. The noise-induced attitude error and jitter and found to be well below the specified error limits for the KAISTSAT satellite series.
PDF

A Study on Characteristics of Children's Voice Preference from Different Pitch (음도 차이에 따른 아동의 선호 음성 특성 연구)

Ham, Eun-Seon;Lim, Kyung-Suk;Yi, So-Hee;Kim, Ha-Kyung
- Speech Sciences
- /
- v.15 no.3
- /
- pp.175-181
- /
- 2008
The aim of this study was to survey 'voice preference' of children from among three voice pitches, which are high-pitch, mid-pitch and low pitch, and understand acoustic characteristics of the best voice chosen. To record distinctive pitches, Dr. Speech(ver. 4.0 Tiger Electronics) was used and we analyzed their choices. Also, we measured subglottal air pressure in aerodynamic analyze and phonatory aerodynamic system(Model 6600, KAY) was used. As a result children preferred to the low-pitch yet there was not any difference by sex. We fined them to prefer higher HNR voice to lower jitter and shimmer voice rate.
PDF

Search Result 75, Processing Time 0.025 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)