• Title/Summary/Keyword: Voice language

Search Result 413, Processing Time 0.024 seconds

A Design of ADPCM CODEC Core for Digital Voice and Image Processing SOC (디지털 음성 및 영상 처리용 SOC를 위한 ADPCM CODEC 코어의 설계)

  • 정중완;홍석일;한희일;조경순
    • Proceedings of the IEEK Conference
    • /
    • 2001.06b
    • /
    • pp.333-336
    • /
    • 2001
  • This paper describes the design and implementation results of 40, 32, 24 and 16kbps ADPCM encoder and decoder circuit, based on the protocol CCITT G.726. We verified the ADPCM algorithm using C language and designed the RTL circuit with Verilog HDL. The circuit has been simulated by Verilog-XL, synthesized by Design Compiler and verified using Xilinx FPGA. Since the synthesized circuit includes a small number of gates, it is expected to be used as a core module in the digital voice and image processing SOC.

  • PDF

The Utility of Perturbation, Non-linear dynamic, and Cepstrum measures of dysphonia according to Signal Typing (음성 신호 분류에 따른 장애 음성의 변동률 분석, 비선형 동적 분석, 캡스트럼 분석의 유용성)

  • Choi, Seong Hee;Choi, Chul-Hee
    • Phonetics and Speech Sciences
    • /
    • v.6 no.3
    • /
    • pp.63-72
    • /
    • 2014
  • The current study assessed the utility of acoustic analyses the most commonly used in routine clinical voice assessment including perturbation, nonlinear dynamic analysis, and Spectral/Cepstrum analysis based on signal typing of dysphonic voices and investigated their applicability of clinical acoustic analysis methods. A total of 70 dysphonic voice samples were classified with signal typing using narrowband spectrogram. Traditional parameters of %jitter, %shimmer, and signal-to-noise ratio were calculated for the signals using TF32 and correlation dimension(D2) of nonlinear dynamic parameter and spectral/cepstral measures including mean CPP, CPP_sd, CPPf0, CPPf0_sd, L/H ratio, and L/H ratio_sd were also calculated with ADSV(Analysis of Dysphonia in Speech and VoiceTM). Auditory perceptual analysis was performed by two blinded speech-language pathologists with GRBAS. The results showed that nearly periodic Type 1 signals were all functional dysphonia and Type 4 signals were comprised of neurogenic and organic voice disorders. Only Type 1 voice signals were reliable for perturbation analysis in this study. Significant signal typing-related differences were found in all acoustic and auditory-perceptual measures. SNR, CPP, L/H ratio values for Type 4 were significantly lower than those of other voice signals and significant higher %jitter, %shimmer were observed in Type 4 voice signals(p<.001). Additionally, with increase of signal type, D2 values significantly increased and more complex and nonlinear patterns were represented. Nevertheless, voice signals with highly noise component associated with breathiness were not able to obtain D2. In particular, CPP, was highly sensitive with voice quality 'G', 'R', 'B' than any other acoustic measures. Thus, Spectral and cepstral analyses may be applied for more severe dysphonic voices such as Type 4 signals and CPP can be more accurate and predictive acoustic marker in measuring voice quality and severity in dysphonia.

Sign Language Image Recognition System Using Artificial Neural Network

  • Kim, Hyung-Hoon;Cho, Jeong-Ran
    • Journal of the Korea Society of Computer and Information
    • /
    • v.24 no.2
    • /
    • pp.193-200
    • /
    • 2019
  • Hearing impaired people are living in a voice culture area, but due to the difficulty of communicating with normal people using sign language, many people experience discomfort in daily life and social life and various disadvantages unlike their desires. Therefore, in this paper, we study a sign language translation system for communication between a normal person and a hearing impaired person using sign language and implement a prototype system for this. Previous studies on sign language translation systems for communication between normal people and hearing impaired people using sign language are classified into two types using video image system and shape input device. However, existing sign language translation systems have some problems that they do not recognize various sign language expressions of sign language users and require special devices. In this paper, we use machine learning method of artificial neural network to recognize various sign language expressions of sign language users. By using generalized smart phone and various video equipment for sign language image recognition, we intend to improve the usability of sign language translation system.

Acoustic characteristics of Motherese

  • Shim, Hee-Jeong;Lee, GeonJae;Hwang, JinKyung;Ko, Do-Heung
    • Phonetics and Speech Sciences
    • /
    • v.6 no.4
    • /
    • pp.189-194
    • /
    • 2014
  • Objective: This study aims to investigate the speech rate, the length of a pause, habitual pitch, and voice intensity of motherese. Subjects and Methods: The research participants comprised 20 mothers (mean age 33 years). Speech data were collected and analyzed using the Real-time Pitch software (KayPENTAX(R)). Results: The average speech rate was 5.33 syllables per second without their infant present and 4.26 syllables per second with their infant present. The average pause length was 1.09 s without their infant present and 1.56 s with their infant present. The average habitual pitch was 199.79 Hz without their infant present and 227.15 Hz with their infant present. The average voice loudness was 61.09 dB without their infant present and 64.49 dB with their infant present. Conclusion: This study presented clinical information for efficiently managing the speech therapy issues of infants and children. This includes proper acoustic and phonological information to recommend to main caregivers.

The relationship between cross language phonetic influences and L2 proficiency in terms of VOT

  • Kim, Mi-Ryoung
    • Phonetics and Speech Sciences
    • /
    • v.3 no.3
    • /
    • pp.3-10
    • /
    • 2011
  • This study examined the production of aspirated stop consonants in Korean and English words to address how the influences differed particularly in terms of proficiency in L2 English. Voice onset times (VOTs) were measured from two American monolinguals and seven Korean speakers. The results showed that VOT patterns for both L1 and L2 stops differed according to their proficiency in L2 English. In L2 English, high proficient speakers produced VOTs that were similar to those of native speakers of English whereas low proficient speakers produced VOTs that were significantly longer than those of proficient speakers. In L1 Korean and L2 English, most of the proficient speakers produced VOTs similarly. Unlike previous findings, Korean VOTs were even shorter than English counterparts. The VOT shortening of aspirated stops in Korean was found for most of the proficient speakers. The findings of the present study suggest that cross language phonetic influences as well as the ongoing VOT shortening in Korean aspirated stops may be correlated with L2 proficiency. Since this is a pilot study with a small number of subjects for each proficiency group, further quantitative study is necessary to generalize.

  • PDF

Generative Interactive Psychotherapy Expert (GIPE) Bot

  • Ayesheh Ahrari Khalaf;Aisha Hassan Abdalla Hashim;Akeem Olowolayemo;Rashidah Funke Olanrewaju
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.4
    • /
    • pp.15-24
    • /
    • 2023
  • One of the objectives and aspirations of scientists and engineers ever since the development of computers has been to interact naturally with machines. Hence features of artificial intelligence (AI) like natural language processing and natural language generation were developed. The field of AI that is thought to be expanding the fastest is interactive conversational systems. Numerous businesses have created various Virtual Personal Assistants (VPAs) using these technologies, including Apple's Siri, Amazon's Alexa, and Google Assistant, among others. Even though many chatbots have been introduced through the years to diagnose or treat psychological disorders, we are yet to have a user-friendly chatbot available. A smart generative cognitive behavioral therapy with spoken dialogue systems support was then developed using a model Persona Perception (P2) bot with Generative Pre-trained Transformer-2 (GPT-2). The model was then implemented using modern technologies in VPAs like voice recognition, Natural Language Understanding (NLU), and text-to-speech. This system is a magnificent device to help with voice-based systems because it can have therapeutic discussions with the users utilizing text and vocal interactive user experience.

A Study on a Non-Voice Section Detection Model among Speech Signals using CNN Algorithm (CNN(Convolutional Neural Network) 알고리즘을 활용한 음성신호 중 비음성 구간 탐지 모델 연구)

  • Lee, Hoo-Young
    • Journal of Convergence for Information Technology
    • /
    • v.11 no.6
    • /
    • pp.33-39
    • /
    • 2021
  • Speech recognition technology is being combined with deep learning and is developing at a rapid pace. In particular, voice recognition services are connected to various devices such as artificial intelligence speakers, vehicle voice recognition, and smartphones, and voice recognition technology is being used in various places, not in specific areas of the industry. In this situation, research to meet high expectations for the technology is also being actively conducted. Among them, in the field of natural language processing (NLP), there is a need for research in the field of removing ambient noise or unnecessary voice signals that have a great influence on the speech recognition recognition rate. Many domestic and foreign companies are already using the latest AI technology for such research. Among them, research using a convolutional neural network algorithm (CNN) is being actively conducted. The purpose of this study is to determine the non-voice section from the user's speech section through the convolutional neural network. It collects the voice files (wav) of 5 speakers to generate learning data, and utilizes the convolutional neural network to determine the speech section and the non-voice section. A classification model for discriminating speech sections was created. Afterwards, an experiment was conducted to detect the non-speech section through the generated model, and as a result, an accuracy of 94% was obtained.

A Study on Development of Multi-Channel Voice Guidance System based on Active 2.45GHz RFID (능동형 2.45GHz RFID 기반의 다채널 음성 안내 시스템 개발에 관한 연구)

  • Jho, Yong-Chul;Li, Zhong-shi;Lee, Doo-Yong;Kim, Jin-Young;Han, Woon-Soo;Lee, Chang-Ho
    • Journal of the Korea Safety Management & Science
    • /
    • v.10 no.2
    • /
    • pp.169-174
    • /
    • 2008
  • In this study we develop core technology of Multi-Charmel Voice Guidance System based on active 2.45GHz RFID that can be used in the field of advertising. Through this, we can service more familiar with voice information to foreigners by own language in large scale international event. As an alternative to the current high-expensive voice guidance system in exhibition hall and provide more value added services to user. As the result, we present the configuration of integrated software platform which include media server, media client and receiver. It can be as a basis infra equipment for a signtseeing in RFID/USN environment. Additionally through tag information analysis about exposure of advertisement that is collected late, supposed system may achieve role as a new marketing tool.

Formant frequency changes of female voice /a/, /i/, /u/ in real ear (실이에서 여자 음성 /ㅏ/, /ㅣ/, /ㅜ/의 포먼트 주파수 변화)

  • Heo, Seungdeok;Kang, Huira
    • Phonetics and Speech Sciences
    • /
    • v.9 no.1
    • /
    • pp.49-53
    • /
    • 2017
  • Formant frequencies depend on the position of tongue, the shape of lips, and larynx. In the auditory system, the external ear canal is an open-end resonator, which can modify the voice characteristics. This study investigates the effect of the real ear on formant frequencies. Fifteen subjects ranging from 22 to 30 years of age participated in the study. This study employed three corner vowels: the low central vowel /a/, the high front vowel /i/, and the high back vowel /u/. For this study, the voice of a well-educated undergraduate who majored in speech-language pathology, was recorded with a high performance condenser microphone placed in the upper pinna and in the ear canal. Paired t-test showed that there were significant difference in the formant frequencies of F1, F2, F3, and F4 between the free field and the real ear. For /a/, all formant frequencies decreased significantly in the real ear. For /i/, F2 increased and F3 and F4 decreased. For /u/, F1 and F2 increased, but F3 and F4 decreased. It seems that these voice modifications in the real ear contribute to interpreting voice quality and understanding speech, timbre, and individual characteristics, which are influenced by the shape of the outer ear and external ear canal in such a way that formant frequencies become centralized in the vowel space.

Evaluation of the readability of self-reported voice disorder questionnaires (자기보고식 음성장애 설문지 문항의 가독성 평가)

  • HyeRim Kwak;Seok-Chae Rhee;Seung Jin Lee;HyangHee Kim
    • Phonetics and Speech Sciences
    • /
    • v.16 no.1
    • /
    • pp.41-48
    • /
    • 2024
  • The significance of self-reported voice assessments concerning patients' chief complaints and quality of life has increased. Therefore, readability assessments of questionnaire items are essential. In this study, readability analyses were performed based on text grade and complexity, vocabulary frequency and grade, and lexical diversity of the 11 Korean versions of self-reported voice disorder questionnaires (KVHI, KAVI, KVQOL, K-SVHI, K-VAPP, K-VPPC, TVSQ, K-VDCQ, K-VFI, K-VTDS, and K-VoiSS). Additionally, a comparative readability assessment was conducted on the original versions of these questionnaires to discern the differences between their Korean counterparts and the questionnaires for children. Consequently, it was determined that voice disorder questionnaires could be used without difficulty for populations with lower literacy levels. Evaluators should consider subjects' reading levels when conducting assessments, and future developments and revisions should consider their reading difficulties.