• Title/Summary/Keyword: speech style

Search Result 85, Processing Time 0.036 seconds

Identification of Voice Features for Recently Voice Fishing by Voice Analysis (음성 분석을 통한 최근 보이스피싱의 음성 특징 규명)

  • Lee, Bum Joo;Cho, Dong Uk;Jeong, Yeon Man
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.41 no.10
    • /
    • pp.1276-1283
    • /
    • 2016
  • The scale of financial damages on voice fishing has not been decreased despite of national and social efforts to reduce the amounts of voice fishing damage. One of these reasons is a sophisticated and vernacular speech style that makes it difficult to recognize the offenders. Furthermore, nowadays, young men have intensively been deceived by not only sophisticated and vernacular speech style which is used the employer of real public offices but also obtained personal information. As a result, this lead directly to the financial damages of younger people who has a stronger judgement than older. For this, we investigated the comparison and analysis between the criminals of voice fishing and the same generation younger people for identifying voice features. The experiment was carried out based on the pitch, bandwidth of pitch, energy, speech speed and voice color for searching the difference of voice characteristics between the criminals of voice fishing and the same generation younger people since 2011. The experimental result shows that there is a significant difference in energy and speech speed between the criminals of voice fishing and the same generation younger people.

Investigating an Automatic Method for Summarizing and Presenting a Video Speech Using Acoustic Features (음향학적 자질을 활용한 비디오 스피치 요약의 자동 추출과 표현에 관한 연구)

  • Kim, Hyun-Hee
    • Journal of the Korean Society for information Management
    • /
    • v.29 no.4
    • /
    • pp.191-208
    • /
    • 2012
  • Two fundamental aspects of speech summary generation are the extraction of key speech content and the style of presentation of the extracted speech synopses. We first investigated whether acoustic features (speaking rate, pitch pattern, and intensity) are equally important and, if not, which one can be effectively modeled to compute the significance of segments for lecture summarization. As a result, we found that the intensity (that is, difference between max DB and min DB) is the most efficient factor for speech summarization. We evaluated the intensity-based method of using the difference between max-DB and min-DB by comparing it to the keyword-based method in terms of which method produces better speech summaries and of how similar weight values assigned to segments by two methods are. Then, we investigated the way to present speech summaries to the viewers. As such, for speech summarization, we suggested how to extract key segments from a speech video efficiently using acoustic features and then present the extracted segments to the viewers.

Durational aspects of Korean nasal geminates

  • Oh, Eunhae
    • Phonetics and Speech Sciences
    • /
    • v.9 no.4
    • /
    • pp.19-25
    • /
    • 2017
  • The current study focused on the production of geminate nasal consonants across different word boundary types in Korean as a function of speech style to investigate whether temporal properties are preserved across varying speaking rates. Assimilated geminates in Korean, known as true geminates, are produced with distinctively longer consonant duration compared to singletons. Despite a large body of literature for geminates across different languages, geminates in Korean have been relatively less investigated with respect to the durational patterns in relative terms and temporal variabilities. In this study, singletons, word-internal geminates and word-boundary (fake) geminates produced by ten native Seoul Korean speakers were compared in terms of absolute consonant closure duration, preceding vowel duration, the relative ratios (consonant-to-preceding vowel duration) as well as the temporal variabilities in speech production. The results showed that word-internal geminates were produced with longer consonant duration and greater temporal variabilities than singletons and word-boundary geminates in absolute duration, indicating relatively greater flexibility in timing. However, only word-internal geminates were produced with distinctively longer consonant duration with significantly lower variability in relative duration regardless of speech styles. The results provide some insight into the representation of temporal information in the production of Korean geminate consonants.

A Performance Improvement Method using Variable Break in Corpus Based Japanese Text-to-Speech System (가변 Break를 이용한 코퍼스 기반 일본어 음성 합성기의 성능 향상 방법)

  • Na, Deok-Su;Min, So-Yeon;Lee, Jong-Seok;Bae, Myung-Jin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.2
    • /
    • pp.155-163
    • /
    • 2009
  • In text-to-speech systems, the conversion of text into prosodic parameters is necessarily composed of three steps. These are the placement of prosodic boundaries. the determination of segmental durations, and the specification of fundamental frequency contours. Prosodic boundaries. as the most important and basic parameter. affect the estimation of durations and fundamental frequency. Break prediction is an important step in text-to-speech systems as break indices (BIs) have a great influence on how to correctly represent prosodic phrase boundaries, However. an accurate prediction is difficult since BIs are often chosen according to the meaning of a sentence or the reading style of the speaker. In Japanese, the prediction of an accentual phrase boundary (APB) and major phrase boundary (MPB) is particularly difficult. Thus, this paper presents a method to complement the prediction errors of an APB and MPB. First, we define a subtle BI in which it is difficult to decide between an APB and MPB clearly as a variable break (VB), and an explicit BI as a fixed break (FB). The VB is chosen using the classification and regression tree, and multiple prosodic targets in relation to the pith and duration are then generated. Finally. unit-selection is conducted using multiple prosodic targets. In the MOS test result. the original speech scored a 4,99. while proposed method scored a 4.25 and conventional method scored a 4.01. The experimental results show that the proposed method improves the naturalness of synthesized speech.

Enhancement of English-to-Korean Translation Quality by Korean Style Generation Patterns (한국어 스타일 생성 패턴에 의한 영한 번역 품질 개선)

  • Choi, Sung-Kwon;Hong, Mun-Pyo;Park, Sang-Kyu
    • Annual Conference on Human and Language Technology
    • /
    • 2003.10d
    • /
    • pp.235-240
    • /
    • 2003
  • 본 논문에서는 영한 자동번역 시스템에 한국어 스타일 생성 패턴을 적용함으로써 영한 번역 품질을 향상하고자 하는 것이 목표이다. 이러한 목표는 기존의 원문에 대한 번역문의 정보 전달 정확성을 측정하는 1차원적인 번역률 평가 방법에서 벗어나 번역문의 정보 정확성뿐만 아니라 자연스러움도 평가할 수 있는 2차원적인 번역률 평가방법으로써 정확성과 스타일을 동시에 평가하는 방법을 제안한다. 2차원적인 번역률 평가 방법에 따라 스타일 생성 패턴이 적용되기 전과 적용된 후의 평가 결과는 100문자의 샘플문을 대상으로 하였을 때, 스타일 생성 패턴에 의해서만 0.5%의 번역률이 향상되는 것을 관찰하였다. 본 논문에서의 스타일 생성 패턴은 단순히 언어간 스타일 차이만 적용한 것이며 향후에는 신문, 일기예보, 기술 매뉴얼과 같은 특정 그룹을 위한 스타일 생성 패턴을 적용할 계획이다.

  • PDF

Measuring Acoustical Parameters of English Words by the Position in the Phrases (영어어구의 위치에 따른 단어의 음향 변수 측정)

  • Yang, Byung-Gon
    • Speech Sciences
    • /
    • v.14 no.4
    • /
    • pp.115-128
    • /
    • 2007
  • The purposes of this paper were to develop an automatic script to collect such acoustic parameters as duration, intensity, pitch and the first two formant values of English words produced by two native Canadian speakers either alone or in a two-word phrase at a normal speed and to compare those values by the position in the phrases. A Praat script was proposed to obtain the comparable parameters at evenly divided time point of the target word. Results showed that the total duration of the word in the phrase was shorter than that of the word produced alone. That was attributed to the pronunciation style of the native speakers generally placing the primary word stress in the first word position. Also, the reduction ratio of the male speaker depended on the word position in the phrase while the female speaker didn't. Moreover, there were different contours of intensity and pitch by the position of the target word in the phrase while almost the same formant patterns were observed. Further studies would be desirable to examine those parameters of the words in the authentic speech materials.

  • PDF

Automatic Speech Style Recognition Through Sentence Sequencing for Speaker Recognition in Bilateral Dialogue Situations (양자 간 대화 상황에서의 화자인식을 위한 문장 시퀀싱 방법을 통한 자동 말투 인식)

  • Kang, Garam;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.2
    • /
    • pp.17-32
    • /
    • 2021
  • Speaker recognition is generally divided into speaker identification and speaker verification. Speaker recognition plays an important function in the automatic voice system, and the importance of speaker recognition technology is becoming more prominent as the recent development of portable devices, voice technology, and audio content fields continue to expand. Previous speaker recognition studies have been conducted with the goal of automatically determining who the speaker is based on voice files and improving accuracy. Speech is an important sociolinguistic subject, and it contains very useful information that reveals the speaker's attitude, conversation intention, and personality, and this can be an important clue to speaker recognition. The final ending used in the speaker's speech determines the type of sentence or has functions and information such as the speaker's intention, psychological attitude, or relationship to the listener. The use of the terminating ending has various probabilities depending on the characteristics of the speaker, so the type and distribution of the terminating ending of a specific unidentified speaker will be helpful in recognizing the speaker. However, there have been few studies that considered speech in the existing text-based speaker recognition, and if speech information is added to the speech signal-based speaker recognition technique, the accuracy of speaker recognition can be further improved. Hence, the purpose of this paper is to propose a novel method using speech style expressed as a sentence-final ending to improve the accuracy of Korean speaker recognition. To this end, a method called sentence sequencing that generates vector values by using the type and frequency of the sentence-final ending appearing in the utterance of a specific person is proposed. To evaluate the performance of the proposed method, learning and performance evaluation were conducted with a actual drama script. The method proposed in this study can be used as a means to improve the performance of Korean speech recognition service.

The Autonomization of French and Vietnamese Literature: Comparing Gustave Flaubert (1821-1880) and Vũ Trọng Phụng (1912-1939)

  • Phung, Ngoc Kien
    • SUVANNABHUMI
    • /
    • v.14 no.1
    • /
    • pp.109-131
    • /
    • 2022
  • This paper compares the French Gustave Flaubert (1821-1880) and the Vietnamese Vũ Trọng Phụng (1912-1939), and explores transformations of their aesthetic experiences that led to the autonomization of French literary field in the nineteenth century and Vietnamese in the early twentieth century. Inspired from the term "archive" coined by Michel Foucault, this article argues that Flaubert, in abandoning the bourgeois tastes, contested realism and built his own writing ideology and style, which is called subjective realism. On the other hand, it also argues that Vũ Trọng Phụng, through the popular report genre, he gained success and evolved his own novel writing style, aptly called the realism of speech. It is ostensible that the transformation in the two authors' writing style and aesthetic experience was derived from the way they distanced themselves from their contemporaries' common tastes while making use of free indirect speeches, all with the aim of granting readers the autonomy of reading.

Durational Interaction of Stops and Vowels in English and Korean Child-Directed Speech

  • Choi, Han-Sook
    • Phonetics and Speech Sciences
    • /
    • v.4 no.2
    • /
    • pp.61-70
    • /
    • 2012
  • The current study observes the durational interaction of tautosyllabic consonants and vowels in the word-initial position of English and Korean child-directed speech (CDS). The effect of phonological laryngeal contrasts in stops on the following vowel duration, and the effect of the intrinsic vowel duration on the release duration of preceding stops in addition to the acoustic realization of the contrastive segments are explored in different prosodic contexts - phrase-initial/medial, focal accented/non-focused - in a marked speech style of CDS. A trade-off relationship between Voice Onset Time (VOT), as consonant release duration, and voicing phonation time, as vowel duration, reported from adult-to-adult speech, and patterns of durational variability are investigated in CDS of two languages with different linguistic rhythms, under systematically controlled prosodic contexts. Speech data were collected from four native English mothers and four native Korean mothers who were talking to their one-word staged infants. In addition to the acoustic measurements, the transformed delta measure is employed as a variability index of individual tokens. Results confirm the durational correlation between prevocalic consonants and following vowels. The interaction is revealed in a compensatory pattern such as longer VOTs followed by shorter vowel durations in both languages. An asymmetry is found in CV interaction in that the effect of consonant on vowel duration is greater than the VOT differences induced by the vowel. Prosodic effects are found such that the acoustic difference is enhanced between the contrastive segments under focal accent, supporting the paradigmatic strengthening effect. Positional variation, however, does not show any systematic effects on the variations of the measured acoustic quantities. Overall vowel duration and syllable duration are longer in English tokens but involve less variability across the prosodic variations. The constancy of syllable duration, therefore, is not found to be more strongly sustained in Korean CDS. The stylistic variation is discussed in relation to the listener under linguistic development in CDS.

UA Tree-based Reduction of Speech DB in a Large Corpus-based Korean TTS (대용량 한국어 TTS의 결정트리기반 음성 DB 감축 방안)

  • Lee, Jung-Chul
    • Journal of the Korea Society of Computer and Information
    • /
    • v.15 no.7
    • /
    • pp.91-98
    • /
    • 2010
  • Large corpus-based concatenating Text-to-Speech (TTS) systems can generate natural synthetic speech without additional signal processing. Because the improvements in the natualness, personality, speaking style, emotions of synthetic speech need the increase of the size of speech DB, it is necessary to prune the redundant speech segments in a large speech segment DB. In this paper, we propose a new method to construct a segmental speech DB for the Korean TTS system based on a clustering algorithm to downsize the segmental speech DB. For the performance test, the synthetic speech was generated using the Korean TTS system which consists of the language processing module, prosody processing module, segment selection module, speech concatenation module, and segmental speech DB. And MOS test was executed with the a set of synthetic speech generated with 4 different segmental speech DBs. We constructed 4 different segmental speech DB by combining CM1(or CM2) tree clustering method and full DB (or reduced DB). Experimental results show that the proposed method can reduce the size of speech DB by 23% and get high MOS in the perception test. Therefore the proposed method can be applied to make a small sized TTS.