• 제목/요약/키워드: speech understanding

검색결과 188건 처리시간 0.024초

Progress, challenges, and future perspectives in genetic researches of stuttering

  • Kang, Changsoo
    • Journal of Genetic Medicine
    • /
    • 제18권2호
    • /
    • pp.75-82
    • /
    • 2021
  • Speech and language functions are highly cognitive and human-specific features. The underlying causes of normal speech and language function are believed to reside in the human brain. Developmental persistent stuttering, a speech and language disorder, has been regarded as the most challenging disorder in determining genetic causes because of the high percentage of spontaneous recovery in stutters. This mysterious characteristic hinders speech pathologists from discriminating recovered stutters from completely normal individuals. Over the last several decades, several genetic approaches have been used to identify the genetic causes of stuttering, and remarkable progress has been made in genome-wide linkage analysis followed by gene sequencing. So far, four genes, namely GNPTAB, GNPTG, NAGPA, and AP4E1, are known to cause stuttering. Furthermore, thegeneration of mouse models of stuttering and morphometry analysis has created new ways for researchers to identify brain regions that participate in human speech function and to understand the neuropathology of stuttering. In this review, we aimed to investigate previous progress, challenges, and future perspectives in understanding the genetics and neuropathology underlying persistent developmental stuttering.

Recurrent Neural Network with Backpropagation Through Time Learning Algorithm for Arabic Phoneme Recognition

  • Ismail, Saliza;Ahmad, Abdul Manan
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 제어로봇시스템학회 2004년도 ICCAS
    • /
    • pp.1033-1036
    • /
    • 2004
  • The study on speech recognition and understanding has been done for many years. In this paper, we propose a new type of recurrent neural network architecture for speech recognition, in which each output unit is connected to itself and is also fully connected to other output units and all hidden units [1]. Besides that, we also proposed the new architecture and the learning algorithm of recurrent neural network such as Backpropagation Through Time (BPTT, which well-suited. The aim of the study was to observe the difference of Arabic's alphabet like "alif" until "ya". The purpose of this research is to upgrade the people's knowledge and understanding on Arabic's alphabet or word by using Recurrent Neural Network (RNN) and Backpropagation Through Time (BPTT) learning algorithm. 4 speakers (a mixture of male and female) are trained in quiet environment. Neural network is well-known as a technique that has the ability to classified nonlinear problem. Today, lots of researches have been done in applying Neural Network towards the solution of speech recognition [2] such as Arabic. The Arabic language offers a number of challenges for speech recognition [3]. Even through positive results have been obtained from the continuous study, research on minimizing the error rate is still gaining lots attention. This research utilizes Recurrent Neural Network, one of Neural Network technique to observe the difference of alphabet "alif" until "ya".

  • PDF

Joint streaming model for backchannel prediction and automatic speech recognition

  • Yong-Seok Choi;Jeong-Uk Bang;Seung Hi Kim
    • ETRI Journal
    • /
    • 제46권1호
    • /
    • pp.118-126
    • /
    • 2024
  • In human conversations, listeners often utilize brief backchannels such as "uh-huh" or "yeah." Timely backchannels are crucial to understanding and increasing trust among conversational partners. In human-machine conversation systems, users can engage in natural conversations when a conversational agent generates backchannels like a human listener. We propose a method that simultaneously predicts backchannels and recognizes speech in real time. We use a streaming transformer and adopt multitask learning for concurrent backchannel prediction and speech recognition. The experimental results demonstrate the superior performance of our method compared with previous works while maintaining a similar single-task speech recognition performance. Owing to the extremely imbalanced training data distribution, the single-task backchannel prediction model fails to predict any of the backchannel categories, and the proposed multitask approach substantially enhances the backchannel prediction performance. Notably, in the streaming prediction scenario, the performance of backchannel prediction improves by up to 18.7% compared with existing methods.

기계번역용 한국어 품사에 관한 연구 (A Study on the Korean Parts-of-Speech for Korean-English Machine Translation)

  • 송재관;박찬곤
    • 한국컴퓨터정보학회논문지
    • /
    • 제5권4호
    • /
    • pp.48-54
    • /
    • 2000
  • 본 논문에서는 한ㆍ영 기계번역을 위한 한국어의 품사를 분류하였고 각 품사의 형태론적 특징을 고찰하였다. 한국어 표준문법에서 제시되는 품사 분류 기준은 의미, 기능, 형태의 세 가지 기준을 적용하고 있으며, 자연언어처리에서도 같은 분류 기준을 바탕으로 하고 있다. 품사 분류에 여러 가지 기준을 적용하는 것은 문법구조 이해 및 품사 분류를 어렵게 한다. 또한 한 영 기계번역시 품사의 불일치로 전처리가 필요하다. 이러한 문제를 해결하기 위하여 본 논문에서는 하나의 기준을 적용하여 품사를 분류하였다. 방법으로 한국어 표준문법에 의하여 말뭉치에 태깅하고 문제점을 찾아내며, 새로운 기준에 의하여 품사를 분류하였다. 본 논문에서 분류된 품사는 한국어 문장에서 통사적 역할이 동일하고, 영어에서의 사전 품사와 동일하며, 품사 분류의 모호성을 제거하고, 한국어의 문장 구조를 명확히 표현한다. 또한 한ㆍ영 기계번역시 패턴 매칭에 의한 목적언어 생성이 가능하게 한다.

  • PDF

효과적인 인간-로봇 상호작용을 위한 딥러닝 기반 로봇 비전 자연어 설명문 생성 및 발화 기술 (Robot Vision to Audio Description Based on Deep Learning for Effective Human-Robot Interaction)

  • 박동건;강경민;배진우;한지형
    • 로봇학회논문지
    • /
    • 제14권1호
    • /
    • pp.22-30
    • /
    • 2019
  • For effective human-robot interaction, robots need to understand the current situation context well, but also the robots need to transfer its understanding to the human participant in efficient way. The most convenient way to deliver robot's understanding to the human participant is that the robot expresses its understanding using voice and natural language. Recently, the artificial intelligence for video understanding and natural language process has been developed very rapidly especially based on deep learning. Thus, this paper proposes robot vision to audio description method using deep learning. The applied deep learning model is a pipeline of two deep learning models for generating natural language sentence from robot vision and generating voice from the generated natural language sentence. Also, we conduct the real robot experiment to show the effectiveness of our method in human-robot interaction.

조음 로보틱스 (Articulatory robotics)

  • 남호성
    • 말소리와 음성과학
    • /
    • 제13권2호
    • /
    • pp.1-7
    • /
    • 2021
  • 음성은 개별 조음 기관(입술, 혓끝, 혓몸, 연구개, 성문)에서 일어나는 협착 운동들의 시공간적 협응 구조라 할 수 있다. 다른 인간의 운동(예: 잡기)과 마찬가지로 각각의 협착 운동은 언어학적으로 의미 있는 task이며, 각 task는 그것과 관계된 기본 요소들의 시너지에 의해 수행된다. 본 연구는 이러한 음성 task가 어떻게 기본 요소들인 joint와 동역학적으로 연계될 수 있는지를 로보틱스의 관점에서 논의하고자 한다. 나아가 로보틱스의 기본 원리를 음성과학 분야에 소개함으로써 운동으로서의 음성이 어떻게 발화되는지에 대한 더 깊은 이해를 가능케 하고, 실제 인간의 조음을 모방한 말하는 기계를 구현하는 데 필요한 이론적 토대를 제공하고자 한다.

Phonological processes of consonants from orthographic to pronounced words in the Buckeye Corpus

  • Yang, Byunggon
    • 말소리와 음성과학
    • /
    • 제11권4호
    • /
    • pp.55-62
    • /
    • 2019
  • This paper investigates the phonological processes of consonants in pronounced words in the Buckeye Corpus and compares the frequency distribution of these processes to provide a clearer understanding of conversational English for linguists and teachers. Both orthographic and pronounced words were extracted from the transcribed label scripts of the Buckeye Corpus. Next, the phonological processes of consonants in the orthographic and pronounced labels were tabulated separately by onsets and codas, and a frequency distribution by consonant process types was examined. The results showed that the majority of the onset clusters were pronounced as the same sounds in the Buckeye Corpus. The participants in the corpus were presumed to speak semiformally. In addition, the onsets have fewer deletions than the codas, which might be related to the information weight of the syllable components. Moreover, there is a significant association and strong positive correlation between the phonological processes of the onsets and codas in men and women. This paper concludes that an analysis of phonological processes in spontaneous speech corpora can contribute to a practical understanding of spoken English. Further studies comparing the current phonological process data with those of other languages would be desirable to establish universal patterns in phonological processes.

Phonological processes of consonants from orthographic to pronounced words in the Seoul Corpus

  • Yang, Byunggon
    • 말소리와 음성과학
    • /
    • 제12권2호
    • /
    • pp.1-7
    • /
    • 2020
  • This paper investigates the phonological processes of consonants in pronounced words in the Seoul Corpus, and compares the frequency distribution of these processes to provide a clearer understanding of conversational Korean to linguists and teachers. To this end, both orthographic and pronounced words were extracted from the transcribed label scripts of the Seoul Corpus. Next, the phonological processes of consonants in the orthographic and pronounced forms were tabulated separately after syllabifying the onsets and codas, and major consonantal processes were examined. First, the results showed that the majority of the orthographic consonants' sounds were pronounced the same way as their pronounced forms. Second, more than three quarters of the onsets were pronounced as the same forms, while approximately half of the codas were pronounced as variants. Third, the majority of different onset and coda symbols were primarily caused by deletions and insertions. Finally, the five phonological process types accounted for only 12.4% of the total possible procedures. Based on these results, this paper concludes that an analysis of phonological processes in spontaneous speech corpora can improve the practical understanding of spoken Korean. Future studies ought to compare the current phonological process data with those of other languages to establish universal patterns in phonological processes.

영어 발화의 재구조와 후-어휘 음운현상의 지도 (Teaching English Restructuring and Post-lexical Phenomena)

  • 이순범
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2002년도 11월 학술대회지
    • /
    • pp.169-172
    • /
    • 2002
  • English is one of the stress-timed languages and has much more dynamic rhythm, stress and the tendency toward the isochronism of stressed syllables. It goes with various English utterance restructuring, irrespective of the pauses by syntactic boundaries, and post-lexically phonological phenomena. Specifically in the real speech acts, the natural utterances of fluent speakers or the broadcasting speech cause much more various English restructuring and phonological phenomena. This has been an obstacle for students in speaking fluent English and understanding normal speech. Therefore, this study tried to focus the most problematic factor in English speaking and listening difficulty on English restructuring and post-lexically phonological phenomena caused by stress-timed rhythm and, second, to point out the importance of teaching English rhythm bearing that in mind.

  • PDF

Dr. Speech Science의 음성합성프로그램을 이용하여 합성한 정상음성과 병적음성(Pathologic Voice)의 음향학적 분석 (Acoustic Analysis of Normal and Pathologic Voice Synthesized with Voice Synthesis Program of Dr. Speech Science)

  • 최홍식;김성수
    • 대한후두음성언어의학회지
    • /
    • 제12권2호
    • /
    • pp.115-120
    • /
    • 2001
  • In this paper, we synthesized vowel /ae/ with voice synthesis program of Dr. Speech Science, and we also synthesized pathologic vowel /ae/ by some parameters such as high frequency gain (HFG), low frequency gain(LFG), pitch flutter(PF) which represents jitter value and flutter of amplitude(FA) which represents shimmer value, and grade ranked as mild, moderate and severe respectively. And then we analysed all pathologic voice by analysis program of Dr. Speech Science. We expect that this synthesized pathologic voices are useful for understanding the parameter such as noise, jitter and shimmer and feedback effect to patient with voice disorder.

  • PDF