• Title/Summary/Keyword: Speech sound

Search Result 627, Processing Time 0.025 seconds

An Experimental Phonetic Study on the Acoustic Characteristics of the Korean Nasal Sound (한국어 비음의 음향적 특성에 관한 실험음성학적 연구)

  • Seong Cheol-Jae
    • MALSORI
    • /
    • no.31_32
    • /
    • pp.9-22
    • /
    • 1996
  • This study aims to describe the acoustic characteristics of Korean nasal sounds making use of the notion of pole and zero. In case of [m], the 1st and 4th formant almost remains as the original shape respectively, on the contrary, the 2nd and 3rd formant were observed as a variable cluster together. Alveolar [n] shows that the 3rd and 4th formant make a variable cluster with their antiformant(zero), however, the 1st and 2nd formant keep the static shape of their on. Velar [$\eta$] has 4 formants below 2900 Hz and the 3rd and 4th formant constitute a variable cluster together as does the case [n]. With respect to the energy distribution in case of [n] and [$\eta$], the energy value diminishes from Fl up to F3 continuously but augments in F4. The [m] shows that in the region of Fl-F2 does the energy fall down and rise from F3 to above.

  • PDF

Neuroanatomical analysis for onomatopoeia : fMRI study

  • Han, Jong-Hye;Choi, Won-Il;Chang, Yong-Min;Jeong, Ok-Ran;Nam, Ki-Chun
    • Annual Conference on Human and Language Technology
    • /
    • 2004.10d
    • /
    • pp.315-318
    • /
    • 2004
  • The purpose of this study is to examine the neuroanatomical areas related with onomatopoeia (sound-imitated word). Using the block-designed fMRI, whole-brain images (N=11) were acquired during lexical decisions. We examined how the lexical information initiates brain activation during visual word recognition. The onomatopoeic word recognition activated the bilateral occipital lobes and superior mid-temporal gyrus.

  • PDF

Dysarthric speaker identification with different degrees of dysarthria severity using deep belief networks

  • Farhadipour, Aref;Veisi, Hadi;Asgari, Mohammad;Keyvanrad, Mohammad Ali
    • ETRI Journal
    • /
    • v.40 no.5
    • /
    • pp.643-652
    • /
    • 2018
  • Dysarthria is a degenerative disorder of the central nervous system that affects the control of articulation and pitch; therefore, it affects the uniqueness of sound produced by the speaker. Hence, dysarthric speaker recognition is a challenging task. In this paper, a feature-extraction method based on deep belief networks is presented for the task of identifying a speaker suffering from dysarthria. The effectiveness of the proposed method is demonstrated and compared with well-known Mel-frequency cepstral coefficient features. For classification purposes, the use of a multi-layer perceptron neural network is proposed with two structures. Our evaluations using the universal access speech database produced promising results and outperformed other baseline methods. In addition, speaker identification under both text-dependent and text-independent conditions are explored. The highest accuracy achieved using the proposed system is 97.3%.

High Frequency Enhancement of Sound Using Wavelet Transform

  • Yoon Won-Jung;Lee Kang-Kyu;Park Kyu-Sik
    • Proceedings of the IEEK Conference
    • /
    • summer
    • /
    • pp.233-236
    • /
    • 2004
  • This paper proposes new method for the enhancement of nonexistent high frequency spectral contents from low sample rate audio signal. For example, Due to the protocol constraint, the audio bandwidth of MP3 is restricted to 16Khz. Although band-restricted MP3 audio provide savings of storage space and network bandwidth, it suffers a major problem of a loss in high frequency fidelity such as localization, ambient information, and bright nature of audio. This paper provides a new mathematical analysis for the adaptive estimation of the high frequency contents based on the nature of the input low sample rate audio. Proposed method can be worked globally to any kind of audio such as speech and music that are restricted by sampling rate and bandwidth.

  • PDF

A Source Separation Algorithm for Stereo Panning Sources (스테레오 패닝 음원을 위한 음원 분리 알고리즘)

  • Baek, Yong-Hyun;Park, Young-Cheol
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.4 no.2
    • /
    • pp.77-82
    • /
    • 2011
  • In this paper, we investigate source separation algorithms for stereo audio mixed using amplitude panning method. This source separation algorithms can be used in various applications such as up-mixing, speech enhancement, and high quality sound source separation. The methods in this paper estimate the panning angles of individual signals using the principal component analysis being applied in time-frequency tiles of the input signal and independently extract each signal through directional filtering. Performances of the methods were evaluated through computer simulations.

Virtual displays and virtual environments

  • Gilkey, R.H.;Isabelle, S.K.;Simpson, B.B.
    • Journal of the Ergonomics Society of Korea
    • /
    • v.16 no.2
    • /
    • pp.101-122
    • /
    • 1997
  • Our recent work on virtual environments and virtual displays is reviewed, including our efforts to establish the Virtual Environment Research, Interactive Technology, And Simulation (VERITAS) facility and our research on spatial hearing. VERITAS is a state-of -the-art multisensory facility, built around the ${CAVE}^{TM}$ technology. High-quality 3D audio is included and haptic interfaces are planned. The facility will support technical and non-technical users working in a wide variety of application areas. Our own research emphasizes the importance of auditory stimulation in virtual environments and complex display systems. Experiments on auditory-aided visual target acquistion, sensory conflict, sound localization in noise, and loxalization of speech stimuli are discussed.

  • PDF

A Novel Computer Human Interface to Remotely Pick up Moving Human's Voice Clearly by Integrating ]Real-time Face Tracking and Microphones Array

  • Hiroshi Mizoguchi;Takaomi Shigehara;Yoshiyasu Goto;Hidai, Ken-ichi;Taketoshi Mishima
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 1998.10a
    • /
    • pp.75-80
    • /
    • 1998
  • This paper proposes a novel computer human interface, named Virtual Wireless Microphone (VWM), which utilizes computer vision and signal processing. It integrates real-time face tracking and sound signal processing. VWM is intended to be used as a speech signal input method for human computer interaction, especially for autonomous intelligent agent that interacts with humans like as digital secretary. Utilizing VWM, the agent can clearly listen human master's voice remotely as if a wireless microphone was put just in front of the master.

  • PDF

Real-time DSP implementation of IMT-2000 speech coding algorithm (IMT-2000 음성 부호화 알고리즘의 실시간 DSP 구현)

  • Seo, Jeong Uk;Gwon, Hong Seok;Park, Man Ho;Bae, Geon Seong
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.38 no.3
    • /
    • pp.68-68
    • /
    • 2001
  • 본 논문에서는 3GPP와 ETSI에서 IMT-2000의 음성부호화 방식 표준안으로 채택한 AMR 음성부호화 알고리즘을 분석하고 C 컴파일러와 어셈블리 언어를 이용한 최적화 과정을 거친 후, 고정 소수점 DSP 칩인 TMS320C6201을 이용하여 실시간 구현하였다. 구현된 codec의 프로그램 메모리는 약 31.06 kWords, 데이터 RAM 메모리는 약 9.75 kWords, 그리고 데이터 ROM 메모리는 약 19.89 kWords 정도를 가지며, 한 프레임(20 ms)을 처리하는데 약 4.38 ms가 소요되어 TMS320C6201 DSP 칩의 전체 가용한 clock의 21.94%만 사용하여도 충분히 실시간으로 동작 가능함을 확인하였다. 또한, DSP 보드상에서 구현한 결과가 ETSI에서 공개한 ANSI C 소스 프로그램의 수행 결과와 일치함을 검증하였고, 구현된 AMR 음성부호화기를 sound I/O 모듈과 결합하여 실험한 결과, 어떠한 음질의 왜곡이나 지연 없이 실시간으로 충분히 동작함을 확인하였다. 마지막으로, Host I/O와 LAN 케이블을 이용하여 AMR 음성부호화 알고리즘을 통한 쌍방간 실시간 통신을 full-duplex 모드로 확인하였다.

The implementation of the Language-Study-Headphone storng to Noise Environment (소음 환경에서 강인한 어학용 헤드폰 구현)

  • Son, Jae-Hyeak;Shin, Jae-Ho
    • 한국정보통신설비학회:학술대회논문집
    • /
    • 2005.08a
    • /
    • pp.397-405
    • /
    • 2005
  • This paper presents a headphone system which has adopted two algorithm to increase sound clearness and to separate signal from noisy environment. In the field of adaptive signal processing, LMS algorithm which is a kind of steepest decent method, can be implemented with more simple calculation, so that we use it to eliminate unwanted noise elements for the proposed system. Futhermore we generate early echo using some delays, then mix it in signal. This process can increase the clearness of signal. In this paper, we prove that the proposed system can be implemented in real time. The proposed system is satisfied to subject assessment test base on MOS(Mean Opinion Score) of ITU-T.

  • PDF

An Empirical Analysis of Auditory Interfaces in Human-computer Interaction

  • Nam, Yoonjae
    • International Journal of Contents
    • /
    • v.9 no.3
    • /
    • pp.29-34
    • /
    • 2013
  • This study attempted to compare usability of auditory interfaces, which is a comprehensive concept that includes safety, utility, effectiveness, and efficiency, in personal computing environments: verbal messages (speech sounds), earcons (musical sounds), and auditory icons (natural sounds). This study hypothesized that verbal messages would offer higher usability than earcons and auditory icons, since the verbal messages are easy to interpret and understand based on semiotic process. In this study, usability was measured by a set of seven items: ability to inform what the program is doing, relevance to visual interfaces, degree of stimulation, degree of understandability, perceived time pressure, clearness of sound outputs, and degrees of satisfaction. Through the experimental research, the results showed that verbal messages provided the highest level of usability. On the contrary, auditory icons showed the lowest level of usability, as they require users to establish new coding schemes, and thus demand more mental effort from users.