• Title/Summary/Keyword: Text-to-speech

Search Result 501, Processing Time 0.029 seconds

Implementation of Artificial Intelligence Speech Recognition Text Repository for Elementary Career Counseling (초등 진로 상담을 위한 인공지능 음성 인식 텍스트 레포지토리 구현)

  • Yu, Minjeong;Ma, Youngji;Koo, Dukhoi
    • 한국정보교육학회:학술대회논문집
    • /
    • 2021.08a
    • /
    • pp.327-333
    • /
    • 2021
  • Currently development of the Artificial Intelligence technology is rapidly progressing in the era of the Fourth Industrial Revolution. The government is trying to improve the education of Artificial Intelligence and cultivating human resources. However there are very few cases where A.I technology is actually used in public education classes. Therefore we designed a text repository by implementing A.I speech recognition to provide career counseling for elementary school students. In the meantime, there have been many difficulties in giving advance consultations required for students' career counseling. In this study we suggested A.I speech recognition technology which can solve addressed problem and we planned various ways to make the program more educational. To conclude we expect A.I technology implemented in this study provides effective solution to career counseling.

  • PDF

A Corpus-based Lexical Analysis of the Speech Texts: A Collocational Approach

  • Kim, Nahk-Bohk
    • English Language & Literature Teaching
    • /
    • v.15 no.3
    • /
    • pp.151-170
    • /
    • 2009
  • Recently speech texts have been increasingly used for English education because of their various advantages as language teaching and learning materials. The purpose of this paper is to analyze speech texts in a corpus-based lexical approach, and suggest some productive methods which utilize English speaking or writing as the main resource for the course, along with introducing the actual classroom adaptations. First, this study shows that a speech corpus has some unique features such as different selections of pronouns, nouns, and lexical chunks in comparison to a general corpus. Next, from a collocational perspective, the study demonstrates that the speech corpus consists of a wide variety of collocations and lexical chunks which a number of linguists describe (Lewis, 1997; McCarthy, 1990; Willis, 1990). In other words, the speech corpus suggests that speech texts not only have considerable lexical potential that could be exploited to facilitate chunk-learning, but also that learners are not very likely to unlock this potential autonomously. Based on this result, teachers can develop a learners' corpus and use it by chunking the speech text. This new approach of adapting speech samples as important materials for college students' speaking or writing ability should be implemented as shown in samplers. Finally, to foster learner's productive skills more communicatively, a few practical suggestions are made such as chunking and windowing chunks of speech and presentation, and the pedagogical implications are discussed.

  • PDF

Implementation of TTS Engine for Natural Voice (자연음 TTS(Text-To-Speech) 엔진 구현)

  • Cho Jung-Ho;Kim Tae-Eun;Lim Jae-Hwan
    • Journal of Digital Contents Society
    • /
    • v.4 no.2
    • /
    • pp.233-242
    • /
    • 2003
  • A TTS(Text-To-Speech) System is a computer-based system that should be able to read any text aloud. To output a natural voice, we need a general knowledge of language, a lot of time, and effort. Furthermore, the sound pattern of english has a variable pattern, which consists of phonemic and morphological analysis. It is very difficult to maintain consistency of pattern. To handle these problems, we present a system based on phonemic analysis for vowel and consonant. By analyzing phonological variations frequently found in spoken english, we have derived about phonemic contexts that would trigger the multilevel application of the corresponding phonological process, which consists of phonemic and allophonic rules. In conclusion, we have a rule data which consists of phoneme, and a engine which economize in system. The proposed system can use not only communication system, but also utilize office automation and so on.

  • PDF

Keyword Retrieval-Based Korean Text Command System Using Morphological Analyzer (형태소 분석기를 이용한 키워드 검색 기반 한국어 텍스트 명령 시스템)

  • Park, Dae-Geun;Lee, Wan-Bok
    • Journal of the Korea Convergence Society
    • /
    • v.10 no.2
    • /
    • pp.159-165
    • /
    • 2019
  • Based on deep learning technology, speech recognition method has began to be applied to commercial products, but it is still difficult to be used in the area of VR contents, since there is no easy and efficient way to process the recognized text after the speech recognition module. In this paper, we propose a Korean Language Command System, which can efficiently recognize and respond to Korean speech commands. The system consists of two components. One is a morphological analyzer to analyze sentence morphemes and the other is a retrieval based model which is usually used to develop a chatbot system. Experimental results shows that the proposed system requires only 16% commands to achieve the same level of performance when compared with the conventional string comparison method. Furthermore, when working with Google Cloud Speech module, it revealed 60.1% of success rate. Experimental results show that the proposed system is more efficient than the conventional string comparison method.

A Spectral Smoothing Algorithm for Unit Concatenating Speech Synthesis (코퍼스 기반 음성합성기를 위한 합성단위 경계 스펙트럼 평탄화 알고리즘)

  • Kim Sang-Jin;Jang Kyung Ae;Hahn Minsoo
    • MALSORI
    • /
    • no.56
    • /
    • pp.225-235
    • /
    • 2005
  • Speech unit concatenation with a large database is presently the most popular method for speech synthesis. In this approach, the mismatches at the unit boundaries are unavoidable and become one of the reasons for quality degradation. This paper proposes an algorithm to reduce undesired discontinuities between the subsequent units. Optimal matching points are calculated in two steps. Firstly, the fullback-Leibler distance measurement is utilized for the spectral matching, then the unit sliding and the overlap windowing are used for the waveform matching. The proposed algorithm is implemented for the corpus-based unit concatenating Korean text-to-speech system that has an automatically labeled database. Experimental results show that our algorithm is fairly better than the raw concatenation or the overlap smoothing method.

  • PDF

'Hanmal' Korean Language Diphone Database for Speech Synthesis

  • Chung, Hyun-Song
    • Speech Sciences
    • /
    • v.12 no.1
    • /
    • pp.55-63
    • /
    • 2005
  • This paper introduces a 'Hanmal' Korean language diphone database for speech synthesis, which has been publicly available since 1999 in the MBROLA web site and never been properly published in a journal. The diphone database is compatible with the MBROLA programme of high-quality multilingual speech synthesis systems. The usefulness of the diphone database is introduced in the paper. The paper also describes the phonetic and phonological structure of the database, showing the process of creating a text corpus. A machine-readable Korean SAMPA convention for the control data input to the MBROLA application is also suggested. Diphone concatenation and prosody manipulation are performed using the MBR-PSOLA algorithm. A set of segment duration models can be applied to the diphone synthesis of Korean.

  • PDF

ToBI Based Prosodic Representation of the Kyungnam Dialect of Korean

  • Cho, Yong-Hyung
    • Speech Sciences
    • /
    • v.2
    • /
    • pp.159-172
    • /
    • 1997
  • This paper proposes a prosodic representation system of the Kyungnam dialect of Korean, based on the ToBI system. In this system, diverse intonation patterns are transcribed on the four parallel tiers: a tone tier, a break index tier, an orthographic tier, and a miscellaneous tier. The tone tier employs pitch accents, phrase accents, and boundary tones marked with diacritics in order to represent various pitch events. The break index tier uses five break indices, numbered from 0 to 4, in order to represent degrees of connectiveness in speech by associating each inter-word position with a break index. In this, each break index represents a boundary of some kind of constituent. This system can contribute not only to a more detailed theory connecting prosody, syntax, and intonation, but also to current text-to-speech synthesis approaches, speech recognition, and other quantitative computational modellings.

  • PDF

A Study on Text Mining Analysis of Presidential Maritime Concept in KOREA (텍스트마이닝을 이용한 한국 대통령의 해양관에 관한 연구)

  • Kim, Sung-Kuk;Lee, Tae-Hwee
    • Journal of Korea Port Economic Association
    • /
    • v.36 no.3
    • /
    • pp.39-54
    • /
    • 2020
  • In the presidential political system, the word of the president has great influence on the formation of national policy and the decision-making process. Policy priorities are determined according to the president's ideology and core values, and various policies are established and executed according to the priorities. Therefore, this paper analyzes the contents of the president's speech. Since the president's speech is a semantic datum, in order to analyze unstructured text, big data analysis is conducted through the methods of machine learning and deep learning. In this study, the president's speech at the "National Sea Day" commemoration was obtained 1996 onwards and analyzed using topic modeling. As a result of the analysis, all the presidents' speeches were delivered with a view of the ocean that was consistent with the direction of their administration. It was confirmed that the ocean-industry-resource topics, which are the intrinsic values of the ocean, were not damaged and consistently emphasized by all presidents.

Speech Synthesis for the Korean large Vocabulary Through the Waveform Analysis in Time Domains and Evauation of Synthesized Speech Quality (시간영역에서의 파형분석에 의한 무제한 어휘 합성 및 음절 유형별 규칙합성음 음질평가)

  • Kang, Chan-Hee;Chin, Yong-Ohk
    • The Journal of the Acoustical Society of Korea
    • /
    • v.13 no.1
    • /
    • pp.71-83
    • /
    • 1994
  • This paper deals with the improvement of the synthesized speech quality and naturality in the Korean TTS(Text-to-Speech) system. We had extracted the parameters(table2) such as its amplitude, duration and pitch period in a syllable through the analysis of speech waveforms(table1) in the time domain and synthesized syllables using them. To the frequencies of the Korean pronunciation large vocabulary dictionary we had synthesized speeches selected 229 syllables such as V types are 19, CV types are 80. VC types are 30 and CVC types are 100. According to the 4 Korean syllable types from the data format dictionary(table3) we had tested each 15 syllables with the objective MOS(Mean Opinion Score) evaluation method about the 4 items i.e., intelligibility, clearness, loudness, and naturality after selecting random group without the knowledge of them. As the results of experiments the qualities of them are very clear and we can control the prosodic elements such as durations, accents and pitch periods (fig9, 10, 11, 12).

  • PDF