• Title/Summary/Keyword: Speech detection

Search Result 471, Processing Time 0.03 seconds

Playout Scheduling Method Based on Adaptive Jitter Estimation for Enhancing VoIP Speech Quality (VoIP 음질향상을 위한 적응적 지터추정 기반의 플레이아웃 스케줄링 방법)

  • Ryu, Sang-Hyeon;Kim, Hyoung-Gook
    • The Journal of the Acoustical Society of Korea
    • /
    • v.33 no.2
    • /
    • pp.133-138
    • /
    • 2014
  • Packet arrival-delay variation, so-called 'jitter' is one of the main factors that degrade the quality of voice in mobile devices at the Voice over Internet Protocol (VoIP). To resolve this issue, a playout scheduling based on adaptive jitter estimation for enhancing VoIP speech quality is proposed. The proposed algorithm copes with the effect of transmission jitter by expanding or compressing each packet according to the predicted network delay and variations. Additionally, the active network jitter estimation incorporates rapid detection of delay spikes and reacts to changes in network conditions. The experimental results have shown that the proposed algorithm delivers high voice quality in unstable network environment.

Traffic Signal Recognition System Based on Color and Time for Visually Impaired

  • P. Kamakshi
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.4
    • /
    • pp.48-54
    • /
    • 2023
  • Nowadays, a blind man finds it very difficult to cross the roads. They should be very vigilant with every step they take. To resolve this problem, Convolutional Neural Networks(CNN) is a best method to analyse the data and automate the model without intervention of human being. In this work, a traffic signal recognition system is designed using CNN for the visually impaired. To provide a safe walking environment, a voice message is given according to light state and timer state at that instance. The developed model consists of two phases, in the first phase the CNN model is trained to classify different images captured from traffic signals. Common Objects in Context (COCO) labelled dataset is used, which includes images of different classes like traffic lights, bicycles, cars etc. The traffic light object will be detected using this labelled dataset with help of object detection model. The CNN model detects the color of the traffic light and timer displayed on the traffic image. In the second phase, from the detected color of the light and timer value a text message is generated and sent to the text-to-speech conversion model to make voice guidance for the blind person. The developed traffic light recognition model recognizes traffic light color and countdown timer displayed on the signal for safe signal crossing. The countdown timer displayed on the signal was not considered in existing models which is very useful. The proposed model has given accurate results in different scenarios when compared to other models.

A Case of Interpretation for Audiological Evaluation in Preschool Child with Mild-to-Moderately Severe Asymmetric Ski-Slop Sensorineural Hearing Loss (학령 전기 경도 및 중등고도 대칭성 고음급추형 감각신경성 난청의 청각학적 평가 해석 증례)

  • Kim, Na-Yeon;So, Won-Seop;Ha, Ji-Wan;Heo, Seung-Deok
    • Journal of rehabilitation welfare engineering & assistive technology
    • /
    • v.11 no.1
    • /
    • pp.9-14
    • /
    • 2017
  • Preschool children to do production and acquisition of phonological system from birth to 8 years of age. If a child has hearing loss, he/she has a lot of difficulties to hear sound. The problem of auditory perception can causes limited speech acquisition, delayed language development, and communication disorders. It also affects learning, social and emotional development. Early detection and diagnosis of hearing loss are important for intervention. However, it may be difficult to detect if the degree of hearing loss are slight and/or it appears only on some frequencies. In cases of these kinds of hearing losses, it is often difficult to provide aural intervention. The goal of this study is to discuss the interpretation of audiological evaluation in case of mild-to-moderately severe asymmetric ski-slop sensorineural hearing loss, analyze communication problems, and concerning about audiological, and speech-language pathological rehabilitation.

The Analysis of Difference in Awareness and Needs of Social Communication of Guardians Caring for Adolescent with Development Disorders Adolscents (발달장애 청소년 양육자의 사회적 의사소통 인식과 요구도 차이 분석)

  • Park, Hyun;Lee, Myung-Soon
    • The Journal of the Korea Contents Association
    • /
    • v.18 no.8
    • /
    • pp.561-572
    • /
    • 2018
  • This study the awareness and the demands of the parents on the communication level for social participation of the youth with developmental disability were identified and investigated the relation on client's age, detection time of the condition, disability grades, handicapped types, speech therapy period. For the non-parametric test, Man-Whitney U test, and Kruskal-Wallis test were performed and for post-hoc test, Scheffe test was performed. The results of this study was that the significant difference was found in the awareness and the demand of the communication level of the youth with developmental disability according to the age of children, time of discovery, disability grade, type of disability and speech therapy period. In conclusion, the speech therapy for the youth with developmental disability should be made in the dimension of communication for the purpose of social participation. The follow-up research to emphasize the social support and the institutional backup will be required.

Quantifying and Analyzing Vocal Emotion of COVID-19 News Speech Across Broadcasters in South Korea and the United States Based on CNN (한국과 미국 방송사의 코로나19 뉴스에 대해 CNN 기반 정량적 음성 감정 양상 비교 분석)

  • Nam, Youngja;Chae, SunGeu
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.2
    • /
    • pp.306-312
    • /
    • 2022
  • During the unprecedented COVID-19 outbreak, the public's information needs created an environment where they overwhelmingly consume information on the chronic disease. Given that news media affect the public's emotional well-being, the pandemic situation highlights the importance of paying particular attention to how news stories frame their coverage. In this study, COVID-19 news speech emotion from mainstream broadcasters in South Korea and the United States (US) were analyzed using convolutional neural networks. Results showed that neutrality was detected across broadcasters. However, emotions such as sadness and anger were also detected. This was evident in Korean broadcasters, whereas those emotions were not detected in the US broadcasters. This is the first quantitative vocal emotion analysis of COVID-19 news speech. Overall, our findings provide new insight into news emotion analysis and have broad implications for better understanding of the COVID-19 pandemic.

Development of Voice Activity Detection Algorithm for Elderly Voice based on the Higher Order Differential Energy Operator (고차 미분에너지 기반 노인 음성에서의 음성 구간 검출 알고리즘 연구)

  • Lee, JiYeoun
    • Journal of Digital Convergence
    • /
    • v.14 no.11
    • /
    • pp.249-255
    • /
    • 2016
  • Since the elderly voices include a lot of noise caused by physiological changes in respiration, phonation, and resonance, the performance of the convergence health-care equipments such as speech recognition, synthesis, analysis program done by elderly voice is deteriorated. Therefore it is necessary to develop researches to operate health-care instruments with elderly voices. In this study, a voice activity detection using a symmetric higher-order differential energy function (SHODEO) was developed and was compared with auto-correlation function(ACF) and the average magnitude difference function(AMDF). It was confirmed to have a better performance than other methods in the voice interval detection. The voice activity detection will be applied to a voice interface for the elderly to improve the accessibility of the smart devices.

A Study on the Automatic Speech Control System Using DMS model on Real-Time Windows Environment (실시간 윈도우 환경에서 DMS모델을 이용한 자동 음성 제어 시스템에 관한 연구)

  • 이정기;남동선;양진우;김순협
    • The Journal of the Acoustical Society of Korea
    • /
    • v.19 no.3
    • /
    • pp.51-56
    • /
    • 2000
  • Is this paper, we studied on the automatic speech control system in real-time windows environment using voice recognition. The applied reference pattern is the variable DMS model which is proposed to fasten execution speed and the one-stage DP algorithm using this model is used for recognition algorithm. The recognition vocabulary set is composed of control command words which are frequently used in windows environment. In this paper, an automatic speech period detection algorithm which is for on-line voice processing in windows environment is implemented. The variable DMS model which applies variable number of section in consideration of duration of the input signal is proposed. Sometimes, unnecessary recognition target word are generated. therefore model is reconstructed in on-line to handle this efficiently. The Perceptual Linear Predictive analysis method which generate feature vector from extracted feature of voice is applied. According to the experiment result, but recognition speech is fastened in the proposed model because of small loud of calculation. The multi-speaker-independent recognition rate and the multi-speaker-dependent recognition rate is 99.08% and 99.39% respectively. In the noisy environment the recognition rate is 96.25%.

  • PDF

Comparative study of data augmentation methods for fake audio detection (음성위조 탐지에 있어서 데이터 증강 기법의 성능에 관한 비교 연구)

  • KwanYeol Park;Il-Youp Kwak
    • The Korean Journal of Applied Statistics
    • /
    • v.36 no.2
    • /
    • pp.101-114
    • /
    • 2023
  • The data augmentation technique is effectively used to solve the problem of overfitting the model by allowing the training dataset to be viewed from various perspectives. In addition to image augmentation techniques such as rotation, cropping, horizontal flip, and vertical flip, occlusion-based data augmentation methods such as Cutmix and Cutout have been proposed. For models based on speech data, it is possible to use an occlusion-based data-based augmentation technique after converting a 1D speech signal into a 2D spectrogram. In particular, SpecAugment is an occlusion-based augmentation technique for speech spectrograms. In this study, we intend to compare and study data augmentation techniques that can be used in the problem of false-voice detection. Using data from the ASVspoof2017 and ASVspoof2019 competitions held to detect fake audio, a dataset applied with Cutout, Cutmix, and SpecAugment, an occlusion-based data augmentation method, was trained through an LCNN model. All three augmentation techniques, Cutout, Cutmix, and SpecAugment, generally improved the performance of the model. In ASVspoof2017, Cutmix, in ASVspoof2019 LA, Mixup, and in ASVspoof2019 PA, SpecAugment showed the best performance. In addition, increasing the number of masks for SpecAugment helps to improve performance. In conclusion, it is understood that the appropriate augmentation technique differs depending on the situation and data.

Context-sensitive Word Error Detection and Correction for Automatic Scoring System of English Writing (영작문 자동 채점 시스템을 위한 문맥 고려 단어 오류 검사기)

  • Choi, Yong Seok;Lee, Kong Joo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.4 no.1
    • /
    • pp.45-56
    • /
    • 2015
  • In this paper, we present a method that can detect context-sensitive word errors and generate correction candidates. Spelling error detection is one of the most widespread research topics, however, the approach proposed in this paper is adjusted for an automated English scoring system. A common strategy in context-sensitive word error detection is using a pre-defined confusion set to generate correction candidates. We automatically generate a confusion set in order to consider the characteristics of sentences written by second-language learners. We define a word error that cannot be detected by a conventional grammar checker because of part-of-speech ambiguity, and propose how to detect the error and generate correction candidates for this kind of error. An experiment is performed on the English writings composed by junior-high school students whose mother tongue is Korean. The f1 value of the proposed method is 70.48%, which shows that our method is promising comparing to the current-state-of-the art.

Walking Aid System for Visually Impaired People by Exploiting Touch-based Interface (촉각 인터페이스를 이용한 시각장애인 보행보조 시스템)

  • Lee, Ji-eun;Oh, Yoosoo
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2015.10a
    • /
    • pp.522-525
    • /
    • 2015
  • In this paper, we propose a walking aid system that guides route to visually impaired people in order to recognize uncertain obstacles based on tactile stimulation. The proposed system is composed of the touch-based obstacle detection module, the obstacle height detection module, and the route guidance algorithms. The touch-based obstacle detection module detects each obstacle, which is located at left, right, and front of a visually impaired person by stimulating his thumb with the rotational force of the servomotor. The obstacle height detection module integrates detected data by the linear arrangement of ultrasonic sensors to identify the height of an obstacle about 3 of-phase(i.e., high, medium, low). The proposed route guidance algorithm guides an optimized path to the visually impaired person by updating his current position information based on the signal of the built-in GPS receiver in smartphone. In addition, the route guidance algorithm delivers information with speech to a visually impaired person through Bluetooth commuination in the developed route guidance app. The proposed system can create a path to avoid the obstacles by recognizing the placed situation of the obstacles with exploring the uncertain path.

  • PDF