• Title/Summary/Keyword: Speech transmission

Search Result 156, Processing Time 0.026 seconds

A Study on Voice Communication Quality Criteria Under Mobile-VoIP Environments

  • Choi, Jae-Hun;Seol, Soon-Uk;Chang, Joon-Hyuk
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.2E
    • /
    • pp.35-42
    • /
    • 2009
  • In this paper, we present criteria of objective measurement of speech quality to provide the mobile-VoIP services efficiently over wireless mobile internet. The mobile-VoIP service, which is based on mobility and is error-prone compared to conventional VoIP over wired network, is about to be launched, but there have not been adequate quality indexes and the Quality of Service (QoS) standards for evaluating speech quality of Mobile-VoIP. In addition, there are many factors influencing on the speech quality in packet network of which packet loss contribute directly to the overall voice communication quality. For this reason, we adopt the Gilbert-Elliot Channel Model for modeling packet network based on IP and assess the voice quality through the objective speech method of ITU-T P. 862 PESQ and ITU-T P. 862.1 MOS-LQO under various packet loss rates in the transmission channel environments. Our simulation results address the specific criteria and QoS for the mobile-VoIP services in terms of the various packet loss environments.

Enhancing Speech Recognition with Whisper-tiny Model: A Scalable Keyword Spotting Approach (Whisper-tiny 모델을 활용한 음성 분류 개선: 확장 가능한 키워드 스팟팅 접근법)

  • Shivani Sanjay Kolekar;Hyeonseok Jin;Kyungbaek Kim
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2024.05a
    • /
    • pp.774-776
    • /
    • 2024
  • The effective implementation of advanced speech recognition (ASR) systems necessitates the deployment of sophisticated keyword spotting models that are both responsive and resource-efficient. The initial local detection of user interactions is crucial as it allows for the selective transmission of audio data to cloud services, thereby reducing operational costs and mitigating privacy risks associated with continuous data streaming. In this paper, we address these needs and propose utilizing the Whisper-Tiny model with fine-tuning process to specifically recognize keywords from google speech dataset which includes 65000 audio clips of keyword commands. By adapting the model's encoder and appending a lightweight classification head, we ensure that it operates within the limited resource constraints of local devices. The proposed model achieves the notable test accuracy of 92.94%. This architecture demonstrates the efficiency as on-device model with stringent resources leading to enhanced accessibility in everyday speech recognition applications.

Interior surface treatment guidelines for classrooms according to the acoustical performance criteria (학교 교실의 음환경 기준에 따른 실내마감 방안)

  • Ryu, Da-Jung;Park, Chan-Jae;Haan, Chan-Hoon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.35 no.2
    • /
    • pp.92-101
    • /
    • 2016
  • There are many results in which acoustical conditions of a classroom play an important role for studying effects and academic achievement of students. However, there are very few guidelines or design proposals which could make appropriate acoustic environment when classrooms are built or renovated. The present study suggests various design proposals satisfying acoustic standards of classrooms based on theoretical calculation and acoustic field experiments. At first, minimum area of sound absorption was calculated which is required to satisfy the acoustic standard for domestic middle and high schools. Also, room acoustic measurements were carried out in order to investigate the acoustic performance of an existing classroom by changing interior finishing materials on ceiling and rear walls. As a result, it was revealed that reverberation time standard below 0.8 s can be acquired even if there is no sound absorption on ceiling which is a general practice executed in Korea. Specially, it was found that if partial area of ceiling would be treated as reflective with the ratio of sound absorption and reflection as 2:1, almost similar acoustic parameters of $C_{50}$, $D_{50}$, RASTI (Rapid Speech Transmission Index) and higher sound levels could be acquired in comparison with the case of entire sound absorption on ceiling.

Speech Reinforcement Based on G.729A Speech Codec Parameter Under Near-End Background Noise Environments (근단 배경 잡음 환경에서 G.729A 음성부호화기 파라미터에 기반한 새로운 음성 강화 기법)

  • Choi, Jae-Hun;Chang, Joon-Hyuk
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.4
    • /
    • pp.392-400
    • /
    • 2009
  • In this paper, we propose an effective speech reinforcement technique base on ITU-T G.729A CS-ACELP codec under the near-end background noise environments. In general, since the intelligibility of the far-end speech for the near-end listener is significantly reduced under near-end noise environments, we require a far-end speech reinforcement approach to avoid this phenomena. In contrast to the conventional speech reinforcement algorithm, we reinforce the excitation signal of the codec's parameters received from the far-end speech signal based on the G.729A speech codec under various background noise environments. Specifically, we first estimate the excitation signal of ambient noise at the near-end through the encoder of the G.729A speech codec, reinforcing the excitation signal of the far-end speech transmitted from the far-end. we specially propose a novel approach to directly reinforce the excitation signal of far-end speech signal based on the decoder of the G.729A. The performance of the proposed algorithm is evaluated by the CCR (Comparison Category Rating) test of the method for subjective determination of transmission quality in ITU-T P.800 under various noise environments and shows better performances compared with conventional SNR Recovery methods.

A Study on the Method of Assessing Spatial Speech Transmission Quality as an Indicator of Room Acoustics -Concentrated on the Articulation Test under Variable Ambient Noise- (건축 음향의 실내 청취조건 평가방법에 관한 연구-변동외부소음하의 명료도시험에 관하여-)

  • Han, Myung-Ho;Lee, Tae-Gang;Oh, Yang-Ki;Kim, Sun-Woo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.10 no.1
    • /
    • pp.5-11
    • /
    • 1991
  • Articulation test is a good predictor of spatial speech transmission quality. Like many other languages, articulation testing method using Korean language was proposed in 1989, and which was proved as a valid indicator in rooms with static background noise. In this paper, the testing method is examined in variable noise conditions. According to the experiment performed in 26 classrooms with variable background noise, the proposed articulation testing method using Korean Language is still in variable conditions.

  • PDF

Investigating the Effects of Hearing Loss and Hearing Aid Digital Delay on Sound-Induced Flash Illusion

  • Moradi, Vahid;Kheirkhah, Kiana;Farahani, Saeid;Kavianpour, Iman
    • Journal of Audiology & Otology
    • /
    • v.24 no.4
    • /
    • pp.174-179
    • /
    • 2020
  • Background and Objectives: The integration of auditory-visual speech information improves speech perception; however, if the auditory system input is disrupted due to hearing loss, auditory and visual inputs cannot be fully integrated. Additionally, temporal coincidence of auditory and visual input is a significantly important factor in integrating the input of these two senses. Time delayed acoustic pathway caused by the signal passing through digital signal processing. Therefore, this study aimed to investigate the effects of hearing loss and hearing aid digital delay circuit on sound-induced flash illusion. Subjects and Methods: A total of 13 adults with normal hearing, 13 with mild to moderate hearing loss, and 13 with moderate to severe hearing loss were enrolled in this study. Subsequently, the sound-induced flash illusion test was conducted, and the results were analyzed. Results: The results showed that hearing aid digital delay and hearing loss had no detrimental effect on sound-induced flash illusion. Conclusions: Transmission velocity and neural transduction rate of the auditory inputs decreased in patients with hearing loss. Hence, the integrating auditory and visual sensory cannot be combined completely. Although the transmission rate of the auditory sense input was approximately normal when the hearing aid was prescribed. Thus, it can be concluded that the processing delay in the hearing aid circuit is insufficient to disrupt the integration of auditory and visual information.

An Information Transmission for Intelligent Train Operation (인텔리전트 열차운전을 위한 정보 전송)

  • Ahn, Sang-Kwon;Choi, Gui-Man;Kim, Yang-Mo
    • Proceedings of the KIEE Conference
    • /
    • 1997.07a
    • /
    • pp.339-341
    • /
    • 1997
  • This study is presenting the method for an effective data transmission in MAGLEV which is now tested and intends to provide for an intelligent operation of signal system in future. To exchange a lot of information, it is ideal to adopt a digital system and a micro-based system is essential for these purposes. FSK modulation and HDLC protocol are adopted on this study and information line assembly which is used as the information exchange, as the speech communication, and as the detection of speed and position is constructed in one unit. Actually this study is produced academic achievements of the data transmission system of MAGLEV train and an advanced method of intelligent operation in future railway system.

  • PDF

On the Mismatch Phenomena in DPCM Coding of Speech (DPCM 음성 부호화기의 부정합현상에 관한 연구)

  • Yoo, Deuk Su;Cho, Dong Ho;Un, Chong Kwan
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.23 no.5
    • /
    • pp.597-604
    • /
    • 1986
  • This paper describes various mismatch phenomena in differential pulse code modulation (DPCM) coding, such as the mismatch effects of probability density functin(pdf), signal variance, and correlation. At a high transmission rate(i.e., above 32 kbits/s), the performance of DPCM can be improved by matching the pdf shape between the input signal and the quantizer. However, the same gain cannot be obtained at a lower transmission rate. Also, it is shown that the gamma quantizer is realtively robust to the variation of pdf shaper and signal variance. Moreover, as the transmission rate increases, the performance of DPCM for the input signal with large variance is worse than that of DPCM for the signal with small variance due to the increase of overload noise. According to our simuladiton results, the mismatch effects of pdf shape and variance appear to yield more degradatin than that of correlation in a DPCM system.

  • PDF

A 4 kbps PSI-VSELP Speech Coding Algorithm (4 kbps PSI-VSELP 음성 부호화 알고리듬)

  • Choi, Yong-Soo;Kang, Hong-Goo;Park, Sang-Wook;Youn, Dae-Hee
    • The Journal of the Acoustical Society of Korea
    • /
    • v.15 no.6
    • /
    • pp.59-65
    • /
    • 1996
  • This paper proposes a 4 kbps PSI-VSELP(Pitch Synchronous Innovation-Vector Sum Excited Linear Prediction) speech coder which produces speech equivalent to that of the conventional 4.8 kbps VSELP. Since the 'half-rate' is differently defined from country to country, there may be a need to reduce the bit rate of conventional half-rate coder. To minimize the degradation of speech quality caused by bit-rate reduction, it is desirable to perform bit-allocation based on the carefull consideration of the effect of various transmission parameters. This paper adopts this analytical approach for bit-allocation at 4 kbps. To improve the quality of the VSELP coder at 4 kbps, basis vectors which play the most important role in the performance, are optimized by an iterative closed-loop training process and the PSI technique is employed in the VSELP performance, are optimized by an iterative closed-loop training process and the PSI technique is employed in the VSELP coder. To demonstrate the performance of the proposed speech coder, we peformed experiments under the noiseless and error free conditions. From experimental results, even though the proposed 4 kbps PSI-VSELP coder showed lower scores in the objective measure, higher scores in subjective measure was obtained compared with those of the conventional 4.8 kbps VSELp.

  • PDF

Audio /Speech Codec Using Variable Delay MDCT/IMDCT (가변 지연 MDCT/IMDCT를 이용한 오디오/음성 코덱)

  • Sangkil Lee;In-Sung Lee
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.16 no.2
    • /
    • pp.69-76
    • /
    • 2023
  • A high-quality audio/voice codec using the MDCT/IMDCT process can perfectly restore the current frame through an overlap-add process with the previous frame. In the overlap-add process, an algorithm delay equal to the frame length occurs. In this paper, we propose a MDCT/IMDCT process that reduces algorithm delay by using a variable phase shift in MDCT/IMDCT process. In this paper, a low-delay audio/speech codec was proposed by applying the low delay MDCT/IMDCT algorithm to the ITU-T standard codec G.729.1 codec. The algorithm delay in the MDCT/IMDCT process can be reduced from 20 ms to 1.25 ms. The performance of the decoded output signal of the audio/speech codec to which low-delay MDCT/IMDCT is applied is evaluated through the PESQ test, which is an objective quality test method. Despite of the reduction in transmission delay, it was confirmed that there is no difference in sound quality from the conventional method.