• Title/Summary/Keyword: Voice, Sound

Search Result 333, Processing Time 0.027 seconds

Adaptive Post Processing of Nonlinear Amplified Sound Signal

  • Lee, Jae-Kyu;Choi, Jong-Suk;Seok, Cheong-Gyu;Kim, Mun-Sang
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2005.06a
    • /
    • pp.872-876
    • /
    • 2005
  • We propose a real-time post processing of nonlinear amplified signal to improve voice recognition in remote talk. In the previous research, we have found the nonlinear amplification has unique advantage for both the voice activity detection and the sound localization in remote talk. However, the original signal becomes distorted due to its nonlinear amplification and, as a result, the rest of sequence such as speech recognition show less satisfactorily results. To remedy this problem, we implement a linearization algorithm to recover the voice signal's linear characteristics after the localization has been done.

  • PDF

A Method of Arrangement of Voice and Sound : For User Interface of Domestic Appliance (음성과 소리의 할당 방법 : 가전제품 UI 를 중심으로)

  • Hong, Ji-Young;Chae, Haeng-Suk;Lee, Seung-Yong;Park, Young-Hyun;Kim, Jun-Hee;Ryu, Hyung-Su;Kim, Jong-Wan;Han, Kwang-Hee
    • 한국HCI학회:학술대회논문집
    • /
    • 2007.02b
    • /
    • pp.478-483
    • /
    • 2007
  • 본 연구는 가전제품 사용자 인터페이스에서 음성 신호와 청각 신호의 최적 할당 방법을 기술하였다. 가정에서 수시로 접하는 가전제품에서 음성 유저 인터페이스(Voice User Interface, 이하 VUI) 는 음성을 매개로 일어나는 인간과 기계 간 인터페이스를 뜻한다. 음성 유저 인터페이스의 단독적 적용보다는 소리 신호와 함께 사용하여 사용자들의 인터페이스를 향상시킬 수 있다. 본 연구에서는 주부 사용자들을 대상으로 F.G.I, 실험, Depth Interview 를 수행하여 가전제품의 음성 생성 및 표현 인터페이스에서 음성과 소리 신호의 배치에 대한 사용자들의 니즈 조사 및 실험 결과를 기반으로 최적의 할당 방법을 제시하였다.

  • PDF

Voice Expression using a Cochlear Filter Model

  • Jarng, Soon-Suck
    • The Journal of the Acoustical Society of Korea
    • /
    • v.15 no.1E
    • /
    • pp.20-28
    • /
    • 1996
  • Speech sounds were practically applied to a cochlear filter which was simulated by an electrical transmission line. The amplitude of the basilar membrane displacement was calculated along the length of the cochlea in temporal response. And the envelope of the amplitude according to the length was arranged for each discrete time interval. The resulting time response of the speech sound was then displayed as a color image. Five vowels such as a, e, I, o, u were applied and their results were compared. The whole procedure of the visualization method of the speech sound using the cochlear filter is described in detail. The filter model response to voice is visualized by passing the voice through the cochlear filter model.

  • PDF

Voice Outcome after Partial Laryngectomy (후두부분절제술 후 음성 결과)

  • Sun, Dong-Il
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.19 no.1
    • /
    • pp.16-20
    • /
    • 2008
  • Excising part or all part of a larynx as a cancer operation results in changes that transgress anatomic, physiologic, psychologic, and social priciples. The quality of life of a patient's life after any given cancer surgery usually is regarded as a second-priority consideration after oncologic safety. With laryngeal surgery, excision of malignant disease typically results in change that significantly influence an individual for the duration of his or her life. Nonetheless, with appropriate rehabilitation the surgical side effects can be minimized to allow for an excellent quality of life. Successful conservation surgery for laryngeal cancer requires careful interdependent selection for patients, lesions and procedure. The technical goal is to minimize trauma to uninvolved tissue and to wisely utilized local tissues or tree flap for reconstruction, while insuring for oncologically sound procedure. Rehabilitation should aim to produce a glottal sound source if possible, however voice therapy to promote false vocal fold vibration and arytenoid to epiglottis source of vibration can produce very satisfactory phonatory results.

  • PDF

Automatic Vowel Sequence Reproduction for a Talking Robot Based on PARCOR Coefficient Template Matching

  • Vo, Nhu Thanh;Sawada, Hideyuki
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.5 no.3
    • /
    • pp.215-221
    • /
    • 2016
  • This paper describes an automatic vowel sequence reproduction system for a talking robot built to reproduce the human voice based on the working behavior of the human articulatory system. A sound analysis system is developed to record a sentence spoken by a human (mainly vowel sequences in the Japanese language) and to then analyze that sentence to give the correct command packet so the talking robot can repeat it. An algorithm based on a short-time energy method is developed to separate and count sound phonemes. A matching template using partial correlation coefficients (PARCOR) is applied to detect a voice in the talking robot's database similar to the spoken voice. Combining the sound separation and counting the result with the detection of vowels in human speech, the talking robot can reproduce a vowel sequence similar to the one spoken by the human. Two tests to verify the working behavior of the robot are performed. The results of the tests indicate that the robot can repeat a sequence of vowels spoken by a human with an average success rate of more than 60%.

Effects of Auditory Warning Types on Response Time and Accuracy in Ship Bridges (선교내에서 청각경보음의 유형이 반응속도와 정확성에 미치는 영향)

  • Ha, Wook-Hyun;Park, Sung-Ha;Kim, Hong-Tae
    • Journal of the Ergonomics Society of Korea
    • /
    • v.29 no.4
    • /
    • pp.673-680
    • /
    • 2010
  • The effects of different auditory warnings on response time and accuracy were studied in a laboratory ship-bridge work environment. Subjective preference on the type of auditory warnings was also of a primary concern. Twenty five subjects were asked to select an appropriate button for the warning sound presented with three types of auditory warning (abstract sound, auditory icon, and voice alarm) and five levels of warning situation (fire, steering failure, collision, engine failure, and low power). Results showed that the response time and accuracy was significantly affected by the types of auditory warning. The voice alarm resulted in a higher accuracy and subjective preference, as compared to the auditory icon and abstract sound. Regarding the response time, auditory icons and voice alarms were equivalent and superior to abstract sounds. Actual or potential applications of this research include guidelines for the design of integrated ship bridge systems.

acoustic analysis of the aging voice;Baby voice (음성 연령에 대한 음향학적 분석;동음을 중심으로)

  • Kim, Ji-Chae;Han, Ji-Yeon;Jeong, Ok-Ran
    • Proceedings of the KSPS conference
    • /
    • 2006.11a
    • /
    • pp.127-130
    • /
    • 2006
  • The purpose of this study is to examine the difference in acoustic features between Young Voices and Aged Voices, which are actually come from the same age group. The 12 female subjects in their thirties were participated and recorded their sustained vowel /a/, connected speech, and reading. Their voices were divided into Younger Voices and Aged Voices, which means voices sound like younger person and sound like in their age or more aged ones. Praat 4.4.22 was used to record and analyze their acoustic features like Fo, SFF, Jitter, Shimmer, HNR, Pitch-range. And the six female listeners guessed the subjects' age and judged whether they sound younger or as like their actual age. We used the Independent t-Test to find the significant difference between those two groups' acoustic features. The result shows a significant difference in Fo, SFF. The above and the previous studies tell us the group who sounds like younger or baby like voice has the similar acoustic features of actually young people.

  • PDF

Convergence research on the speaker's voice perceived by listener, and suggestions for future research application

  • Hahm, SangWoo
    • International journal of advanced smart convergence
    • /
    • v.11 no.1
    • /
    • pp.55-63
    • /
    • 2022
  • Although research on the leader's or speaker's voice has been continuously conducted, existing research has a single point of view. Sound analysis of voice characteristics has been studied from engineering perspectives, and leadership trait theory has been studied from a business perspective. Convergence studies on leader voice and member cognition are being attempted today. Convergence research on voice has a positive effect on refinement of voice analysis, diversification of voice use, and establishment of voice utilization strategy. This study explains the current flow of research on convergence between speaker's voice and listener's perception, and suggests a direction for the future development of voice fusion research. Furthermore, in connection with AI in the 4th industrial age, new attempts for voice research are sought. First, advances in AI focus on strategically generating the voices needed for individual situations. Second, the voice corrected in real time will support the leader and speaker to utilize the desired voice type. Third, voices through AI based on big data will affect the cognition, attitude and behavior of individual listeners who members, customers, and students in more diverse situations. The purpose and significance of this study is to suggest the way to research the leader's voice recognized by members, and to suggest a method that can be applied in various situations.

Clinical utility of auditory perceptual assessments in the discrimination of a diplophonic voice (이중음성 판별에 있어 청지각적 평가의 임상적 유용성)

  • Bae, Inho;Kwon, Soonbok
    • Phonetics and Speech Sciences
    • /
    • v.10 no.1
    • /
    • pp.75-81
    • /
    • 2018
  • Diplophonia is generally defined as the perception of more than one fundamental frequency component in a voice. Its perceptual aspect has traditionally been used to evaluate diplophonia because the perceptions can be easily evaluated, but there are limitations in the validity of the reliability of the intra- and inter-raters, examination situation, and variation of voice sample. Therefore, the purpose of this study is to confirm the reliability and accuracy of auditory perceptual evaluation by comparing non-invasive indirect assessment methods (sound waveform and EGG analysis), and to identify their usefulness with diplophonia. A total of 28 diplophonic voices and 39 non-periodic voices were assessed. Three raters assessed the diplophonia by performing an auditory perception evaluation and identifying the quasi-periodic perturbations of the acoustic waveform and EGG. Among the three discrimination methods, intra- and inter-rater reliability, sensitivity, specificity, accuracy, positive likelihood ratio, and negative likelihood ratio were examined, and the McNemar test was performed to compare the discriminant agreement. The accuracy of the auditory perceptual evaluation (86.57%) was not significantly different from that of sound waveform acoustic (88.06%), but it was significantly different from that of EGG (83.33%). The reading time (6.02 s) for the auditory perceptual evaluation was significantly different from that for sound waveform analysis (30.15 s) and EGG analysis (16.41 s). In the discrimination of diplophonia, auditory perceptual evaluation has sufficient reliability and accuracy as compared to sound waveform and EGG. Since immediate feedback is possible, auditory perceptual evaluation is more convenient. Therefore, it can continue to be used as a tool to discriminate diplophonia in clinical practice.

Implementation of Real-time Sound-location Tracking Method using TDoA for Smart Lecture System (스마트 강의 시스템을 위한 시간차 검출 방식의 실시간 음원 추적 기법 구현)

  • Kang, Minsoo;Oh, Woojin
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.21 no.4
    • /
    • pp.708-717
    • /
    • 2017
  • Tracking of sound-location is widely used in various area such as intelligent CCTV, video conference and voice commander. In this paper we introduce the real-time sound-location tracking method for smart lecture system using TDoA(Time Difference of Arrival) with orthogonal microphone array on the ceiling. Through discussion on some models of TDoA detection, cross correlation method using linear microphone array is proposed. Orthogonal array with 5 microphone could detect omni direction of sound-location. For real-time detection we adopt the threshold of received energy for eliminating no-voice interval, signed cross correlation for reducing computational complexity. The detected azimuth angles are processed using median filter for lowering the angle deviation. The proposed system is implemented with high performance MCU of TMS320F379D and MEMs microphone module and shows the accuracy of 0.5 and 6.5 in degree for white noise and lectured voice, respectively.