• Title/Summary/Keyword: Prosody

Search Result 208, Processing Time 0.027 seconds

UA Tree-based Reduction of Speech DB in a Large Corpus-based Korean TTS (대용량 한국어 TTS의 결정트리기반 음성 DB 감축 방안)

  • Lee, Jung-Chul
    • Journal of the Korea Society of Computer and Information
    • /
    • v.15 no.7
    • /
    • pp.91-98
    • /
    • 2010
  • Large corpus-based concatenating Text-to-Speech (TTS) systems can generate natural synthetic speech without additional signal processing. Because the improvements in the natualness, personality, speaking style, emotions of synthetic speech need the increase of the size of speech DB, it is necessary to prune the redundant speech segments in a large speech segment DB. In this paper, we propose a new method to construct a segmental speech DB for the Korean TTS system based on a clustering algorithm to downsize the segmental speech DB. For the performance test, the synthetic speech was generated using the Korean TTS system which consists of the language processing module, prosody processing module, segment selection module, speech concatenation module, and segmental speech DB. And MOS test was executed with the a set of synthetic speech generated with 4 different segmental speech DBs. We constructed 4 different segmental speech DB by combining CM1(or CM2) tree clustering method and full DB (or reduced DB). Experimental results show that the proposed method can reduce the size of speech DB by 23% and get high MOS in the perception test. Therefore the proposed method can be applied to make a small sized TTS.

A study of reciting the formal poetries of Korea and French in digital era - Shijo(Korean verse) vs Sonnet (French) (콘텐츠를 위한 한ㆍ불 정형시가 낭송법의 비교 고찰)

  • 이산호
    • Sijohaknonchong
    • /
    • v.19 no.1
    • /
    • pp.85-106
    • /
    • 2003
  • Recently, the sonnet and the shijo, each representing French and Korean formal poetries, are tend to be read with the eyes only, as were more accustomed to written literature. But even after almost three millennia of written literature and increased use of digitalized poems, poetry retains its appeal to the ear as well as to the eye. To read a poem only by eyes might be wrong because it is designed to be read aloud by mouth and understood by ear, and will decrease the aesthetic sense otherwise. It is essential to find the right way to recite a poem in this dramatically changed society, and is especially important when many shijos are changing into digitalized forms to adapt the new wave of our society. The sonnet and the shijo emphasize the importance of the harmony of sounds and rhythms with certain structure, and have their own prosodies. The emotions of the speaker in poems are expressed with words. When they are pronounced. each phoneme has its own phonemic characteristics. When comparing the The Broken Bell(Baudelaire) and Chopoong ga (Jong Seo Kim) in terms of prosody and phonetics. the speakers emotions are closely related with the phonetic structure of each word. In The Broken Bell, the phonetic value of rhymes, repeated phonemes, concentration of front and back vowels. rhythms of onesyllable words shape the overall image of this poem describing the productivity of bells as appose to the sterility of the soul. Chopoong ga also shows the determined and strong will of the speaker by frequent glottalized sounds. distribution and concentration of certain vowels. and frequent use of plosives. As you see in these examples, phones, beats, and rhythms are not the mere transmitter of meaning but possess their expressive values of their own and should be the first to be considered when reciting a poem.

  • PDF

Speech Evaluation Tasks Related to Subthalamic Nucleus Deep Brain Stimulation in Idiopathic Parkinson's Disease: A Review (특발성 파킨슨병의 시상밑부핵 심부뇌자극술 관련 말 평가 과제에 대한 문헌연구)

  • Kim, Sun Woo;Kim, Hyang Hee
    • 재활복지
    • /
    • v.18 no.4
    • /
    • pp.237-255
    • /
    • 2014
  • Idiopathic Parkinson disease(IPD) is an neurodegenerative disease caused by the loss of dopamine cells in the substantia nigra, a region of midbrain. Its major symptoms are muscular rigidity, bradykinesia, resting tremor, and postural instability. An estimated 70~90% of patients with IPD also have hypokinetic dysarthria. Subthalamic nucleus deep brain stimulation (STN-DBS) has been reported to be successful in relieving the core motor symptoms of IPD in the advanced stages of the disease. However, data on the effects of STN-DBS on speech performance are inconsistent. A medline literature search was done to retrieve articles published from 1987 to 2012. The results were narrowed down to focus on speech performance under STN-DBS based perceptual, acoustic, and/or aerodynamic analyses. Among the 32 publications which dealt with speech performance after STN-DBS indicated improvement(42%), deterioration(29%), mixed results(26%), or no change(3%). The most favorite method was found to be based upon acoustic analysis by using a vowel prolongation and Unified Parkinson's Disease Rating Scale(UPDRS). For the purpose of verifying the effect of the STN-DBS, speech evaluation should be undertaken on all speech components such as articulation, resonance, phonation, respiration, and prosody by using a contextual speech task.

Development and validation of a Korean Affective Voice Database (한국형 감정 음성 데이터베이스 구축을 위한 타당도 연구)

  • Kim, Yeji;Song, Hyesun;Jeon, Yesol;Oh, Yoorim;Lee, Youngmee
    • Phonetics and Speech Sciences
    • /
    • v.14 no.3
    • /
    • pp.77-86
    • /
    • 2022
  • In this study, we reported the validation results of the Korean Affective Voice Database (KAV DB), an affective voice database available for scientific and clinical use, comprising a total of 113 validated affective voice stimuli. The KAV DB includes audio-recordings of two actors (one male and one female), each uttering 10 semantically neutral sentences with the intention to convey six different affective states (happiness, anger, fear, sadness, surprise, and neutral). The database was organized into three separate voice stimulus sets in order to validate the KAV DB. Participants rated the stimuli on six rating scales corresponding to the six targeted affective states by using a 100 horizontal visual analog scale. The KAV DB showed high internal consistency for voice stimuli (Cronbach's α=.847). The database had high sensitivity (mean=82.8%) and specificity (mean=83.8%). The KAV DB is expected to be useful for both academic research and clinical purposes in the field of communication disorders. The KAV DB is available for download at https://kav-db.notion.site/KAV-DB-75 39a36abe2e414ebf4a50d80436b41a.

Interferometric Monitoring of Gamma-Ray Bright AGNs: 4C +28.07 and Its Synchrotron Self-Absorption Spectrum

  • Myoung-Seok Nam;Sang-Sung Lee;Whee Yeon Cheong
    • Journal of The Korean Astronomical Society
    • /
    • v.56 no.2
    • /
    • pp.231-252
    • /
    • 2023
  • We present the analysis results of the simultaneous multifrequency observations of the blazar 4C +28.07. The observations were conducted by the Interferometric Monitoring of Gamma-ray Bright Active Galactic Nuclei (iMOGABA) program, which is a key science program of the Korean Very Long Baseline Interferometry (VLBI) Network (KVN). Observations of the iMOGABA program for 4C +28.07 were conducted from 16 January 2013 (MJD 56308) to 13 March 2020 (MJD 58921). We also used γ-ray data from the Fermi Large Array Telescope (Fermi-LAT) Light Curve Repository, covering the energy range from 100 MeV to 100 GeV. We divided the iMOGABA data and the Fermi-LAT data into five periods from 0 to 4, according to the prosody of the 22 GHz data and the presence or absence of the data. In order to investigate the characteristics of each period, the light curves were plotted and compared. However, a peak that formed a hill was observed earlier than the period of a strong γ-ray flare at 43-86 GHz in period 3 (MJD 57400-58100). Therefore, we assumed that the minimum total CLEANed flux density for each frequency was quiescent flux (Sq) in which the core of 4C +28.07 emitted the minimum, with the variable flux (Svar) obtained by subtracting Sq from the values of the total CLEANed flux density. We then compared the variability of the spectral indices (α) between adjacent frequencies through a spectral analysis. Most notably, α22-43 showed optically thick spectra in the absence of a strong γ-ray flare, and when the flare appeared, α22-43 became optically thinner. In order to find out the characteristics of the magnetic field in the variable region, the magnetic field strength in the synchrotron self-absorption (BSSA) and the equipartition magnetic field strength (Beq) were obtained. We found that BSSA is largely consistent with Beq within the uncertainty, implying that the SSA region in the source is not significantly deviated from the equipartition condition in the γ-ray quiescent periods.

Automatic Recognition of Pitch Accent Using Distributed Time-Delay Recursive Neural Network (분산 시간지연 회귀신경망을 이용한 피치 악센트 자동 인식)

  • Kim Sung-Suk
    • The Journal of the Acoustical Society of Korea
    • /
    • v.25 no.6
    • /
    • pp.277-281
    • /
    • 2006
  • This paper presents a method for the automatic recognition of pitch accents over syllables. The method that we propose is based on the time-delay recursive neural network (TDRNN). which is a neural network classifier with two different representation of dynamic context: the delayed input nodes allow the representation of an explicit trajectory F0(t) along time. while the recursive nodes provide long-term context information that reflects the characteristics of pitch accentuation in spoken English. We apply the TDRNN to pitch accent recognition in two forms: in the normal TDRNN. all of the prosodic features (pitch. energy, duration) are used as an entire set in a single TDRNN. while in the distributed TDRNN. the network consists of several TDRNNs each taking a single prosodic feature as the input. The final output of the distributed TDRNN is weighted sum of the output of individual TDRNN. We used the Boston Radio News Corpus (BRNC) for the experiments on the speaker-independent pitch accent recognition. π 1e experimental results show that the distributed TDRNN exhibits an average recognition accuracy of 83.64% over both pitch events and non-events.

A Pre-Selection of Candidate Units Using Accentual Characteristic In a Unit Selection Based Japanese TTS System (일본어 악센트 특징을 이용한 합성단위 선택 기반 일본어 TTS의 후보 합성단위의 사전선택 방법)

  • Na, Deok-Su;Min, So-Yeon;Lee, Kwang-Hyoung;Lee, Jong-Seok;Bae, Myung-Jin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.26 no.4
    • /
    • pp.159-165
    • /
    • 2007
  • In this paper, we propose a new pre-selection of candidate units that is suitable for the unit selection based Japanese TTS system. General pre-selection method performed by calculating a context-dependent cost within IP (Intonation Phrase). Different from other languages, however. Japanese has an accent represented as the height of a relative pitch, and several words form a single accentual phrase. Also. the prosody in Japanese changes in accentual phrase units. By reflecting such prosodic change in pre-selection. the qualify of synthesized speech can be improved. Furthermore, by calculating a context-dependent cost within accentual phrase, synthesis speed can be improved than calculating within intonation phrase. The proposed method defines AP. analyzes AP in context and performs pre-selection using accentual phrase matching which calculates CCL (connected context length) of the Phoneme's candidates that should be synthesized in each accentual phrase. The baseline system used in the proposed method is VoiceText, which is a synthesizer of Voiceware. Evaluations were made on perceptual error (intonation error, concatenation mismatch error) and synthesis time. Experimental result showed that the proposed method improved the qualify of synthesized speech. as well as shortened the synthesis time.

An Investigation on the Periodical Transition of News related to North Korea using Text Mining (텍스트마이닝을 활용한 북한 관련 뉴스의 기간별 변화과정 고찰)

  • Park, Chul-Soo
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.63-88
    • /
    • 2019
  • The goal of this paper is to investigate changes in North Korea's domestic and foreign policies through automated text analysis over North Korea represented in South Korean mass media. Based on that data, we then analyze the status of text mining research, using a text mining technique to find the topics, methods, and trends of text mining research. We also investigate the characteristics and method of analysis of the text mining techniques, confirmed by analysis of the data. In this study, R program was used to apply the text mining technique. R program is free software for statistical computing and graphics. Also, Text mining methods allow to highlight the most frequently used keywords in a paragraph of texts. One can create a word cloud, also referred as text cloud or tag cloud. This study proposes a procedure to find meaningful tendencies based on a combination of word cloud, and co-occurrence networks. This study aims to more objectively explore the images of North Korea represented in South Korean newspapers by quantitatively reviewing the patterns of language use related to North Korea from 2016. 11. 1 to 2019. 5. 23 newspaper big data. In this study, we divided into three periods considering recent inter - Korean relations. Before January 1, 2018, it was set as a Before Phase of Peace Building. From January 1, 2018 to February 24, 2019, we have set up a Peace Building Phase. The New Year's message of Kim Jong-un and the Olympics of Pyeong Chang formed an atmosphere of peace on the Korean peninsula. After the Hanoi Pease summit, the third period was the silence of the relationship between North Korea and the United States. Therefore, it was called Depression Phase of Peace Building. This study analyzes news articles related to North Korea of the Korea Press Foundation database(www.bigkinds.or.kr) through text mining, to investigate characteristics of the Kim Jong-un regime's South Korea policy and unification discourse. The main results of this study show that trends in the North Korean national policy agenda can be discovered based on clustering and visualization algorithms. In particular, it examines the changes in the international circumstances, domestic conflicts, the living conditions of North Korea, the South's Aid project for the North, the conflicts of the two Koreas, North Korean nuclear issue, and the North Korean refugee problem through the co-occurrence word analysis. It also offers an analysis of South Korean mentality toward North Korea in terms of the semantic prosody. In the Before Phase of Peace Building, the results of the analysis showed the order of 'Missiles', 'North Korea Nuclear', 'Diplomacy', 'Unification', and ' South-North Korean'. The results of Peace Building Phase are extracted the order of 'Panmunjom', 'Unification', 'North Korea Nuclear', 'Diplomacy', and 'Military'. The results of Depression Phase of Peace Building derived the order of 'North Korea Nuclear', 'North and South Korea', 'Missile', 'State Department', and 'International'. There are 16 words adopted in all three periods. The order is as follows: 'missile', 'North Korea Nuclear', 'Diplomacy', 'Unification', 'North and South Korea', 'Military', 'Kaesong Industrial Complex', 'Defense', 'Sanctions', 'Denuclearization', 'Peace', 'Exchange and Cooperation', and 'South Korea'. We expect that the results of this study will contribute to analyze the trends of news content of North Korea associated with North Korea's provocations. And future research on North Korean trends will be conducted based on the results of this study. We will continue to study the model development for North Korea risk measurement that can anticipate and respond to North Korea's behavior in advance. We expect that the text mining analysis method and the scientific data analysis technique will be applied to North Korea and unification research field. Through these academic studies, I hope to see a lot of studies that make important contributions to the nation.