• Title/Summary/Keyword: Speech feature

Search Result 712, Processing Time 0.024 seconds

The Experimental Phonetic Study of Word Accent in Standard Korean (표준한국어 악센트의 실험음성학적 연구 -청취 테스트 및 음향분석-)

  • Seong Cheol-jae
    • MALSORI
    • /
    • no.21_24
    • /
    • pp.43-89
    • /
    • 1992
  • In this thesis, the prominent aspect of word accent in standard Korean is studied by auditory test and acoustic analysis experiment. The definition of 'accent' is, following Hoyoung Lee's discussion(1990), to be described as 'the means whereby a focused part of an utterance is made to stand out in order to concentrate the hearer's attention on it.' That is to say, the ten of 'accent' may be described in terms of phonological phenomenon and the accented syllable can be phonetically prominent as the result of those phonological process. Prosodic features may have different characteristics in different languages whether they contain linguistically important functions or not. Thus the characteristics of word accent in standard Korean will be determined as the content and trait of prosodic features. Following this viewpoint, present study looked over prosodic features which may effect the characteristics of word accent in standard Korean, through systematic experimental procedure. And the result of this experiment has been verified by statistical method, the T-test, for the purpose of identifying the relatedness among prosodic features(parameters). This thesis, therefore, aimed to investigate the intrinsic acoustic and physical qualities of the word accent in standard Korean. Nonsense words composed by 'mal' and 'ma' which can be divided into 'heavy syllable' and 'light syllable' quoted from Hyman(1975) have been classified into 28 types with respect to syllable numbers(2 syl., 3 sy1., 4 syl.) and these words have become the target of auditory test and acoustic experiment. As the result of those experimental Procedures, the word accent in standard Korean may be said that it has a tendency of fixing first two syllables regardless of syllable numbers. The syllable types of HH, HL, LL in the first two syllables may be prominent at first syllable and the type of H may be at second syllable. Various prosodic features(parameters) including duration, intensity, and Fo(purely phonetic terms) were also strengthened in those positions. The result of this experiment can be cleared up like these : 1. The most important feature is proved as 'duration', the feature of intensity resulted in more subsidiary one than the feature of duration. 2. Fo( fundamental frequency) could be observed as having some coherent contour through almost all syllable types(99 %), that is, in 2 syllable types, it had rising contour, in 2 syllable types, rising-falling contour, and in 4 syllable types, it contained rising-falling-rising contour. The result of auditory test was different with those contour forms of all Fo surveyed. With respect to these results, the discuss for Fo is determined' to be excluded comparing other features. 3. Finally, this thesis resulted in a decision that the word accent in standard Korean may has fixed(somewhat weaker) accent, especially fixed at first two syllables in almost all words. 4. Various kinds of syllable types related with 2,3,4 syllables, therefore, can be reclassified into 4 types of HH, HL, LH, LL following the concept of accent fixing placement(i.e. first two syllables). In these 4 types, the types of HH, HL, LL were prominent at the position of the first syllable , and the type of LH was prominent at the second syllable otherwise.

  • PDF

Continuous Speech Recognition based on Parmetric Trajectory Segmental HMM (모수적 궤적 기반의 분절 HMM을 이용한 연속 음성 인식)

  • 윤영선;오영환
    • The Journal of the Acoustical Society of Korea
    • /
    • v.19 no.3
    • /
    • pp.35-44
    • /
    • 2000
  • In this paper, we propose a new trajectory model for characterizing segmental features and their interaction based upon a general framework of hidden Markov models. Each segment, a sequence of vectors, is represented by a trajectory of observed sequences. This trajectory is obtained by applying a new design matrix which includes transitional information on contiguous frames, and is characterized as a polynomial regression function. To apply the trajectory to the segmental HMM, the frame features are replaced with the trajectory of a given segment. We also propose the likelihood of a given segment and the estimation of trajectory parameters. The obervation probability of a given segment is represented as the relation between the segment likelihood and the estimation error of the trajectories. The estimation error of a trajectory is considered as the weight of the likelihood of a given segment in a state. This weight represents the probability of how well the corresponding trajectory characterize the segment. The proposed model can be regarded as a generalization of a conventional HMM and a parametric trajectory model. The experimental results are reported on the TIMIT corpus and performance is show to improve significantly over that of the conventional HMM.

  • PDF

Animation and Machines: designing expressive robot-human interactions (애니메이션과 기계: 감정 표현 로봇과 인간과의 상호작용 연구)

  • Schlittler, Joao Paulo Amaral
    • Cartoon and Animation Studies
    • /
    • s.49
    • /
    • pp.677-696
    • /
    • 2017
  • Cartoons and consequently animation are an effective way of visualizing futuristic scenarios. Here we look at how animation is becoming ubiquitous and an integral part of this future today: the cybernetic and mediated society that we are being transformed into. Animation therefore becomes a form of speech between humans and this networked reality, either as an interface or as representation that gives temporal form to objects. Animation or specifically animated films usually are associated with character based short and feature films, fiction or nonfiction. However animation is not constricted to traditional cinematic formats and language, the same way that design and communication have become treated as separate fields, however according to $Vil{\acute{e}}m$ Flusser they aren't. The same premise can be applied to animation in a networked culture: Animation has become an intrinsic to design processes and products - as in motion graphics, interface design and three-dimensional visualization. Video-games, virtual reality, map based apps and social networks constitute layers of an expanded universe that embodies our network based culture. They are products of design and media disciplines that are increasingly relying on animation as a universal language suited to multi-cultural interactions carried in digital ambients. In this sense animation becomes a discourse, the same way as Roland Barthes describes myth as a type of speech. With the objective of exploring the role of animation as a design tool, the proposed research intends to develop transmedia creative visual strategies using animation both as narrative and as an user interface.

Creation and labeling of multiple phonotopic maps using a hierarchical self-organizing classifier (계층적 자기조직화 분류기를 이용한 다수 음성자판의 생성과 레이블링)

  • Chung, Dam;Lee, Kee-Cheol;Byun, Young-Tai
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.21 no.3
    • /
    • pp.600-611
    • /
    • 1996
  • Recently, neural network-based speech recognition has been studied to utilize the adaptivity and learnability of neural network models. However, conventional neural network models have difficulty in the co-articulation processing and the boundary detection of similar phonmes of the Korean speech. Also, in case of using one phonotopic map, learning speed may dramatically increase and inaccuracies may be caused because homogeneous learning and recognition method should be applied for heterogenous data. Hence, in this paper, a neural net typewriter has been designed using a hierarchical self-organizing classifier(HSOC), and related algorithms are presented. This HSOC, during its learing stage, distributed phoneme data on hierarchically structured multiple phonotopic maps, using Kohonen's self-organizing feature maps(SOFM). Presented and experimented in this paper were the algorithms for deciding the number of maps, map sizes, the selection of phonemes and their placement per map, an approapriate learning and preprocessing method per map. If maps are divided according to a priorlinguistic knowledge, we would have difficulty in acquiring linguistic knowledge and how to alpply it(e.g., processing extended phonemes). Contrarily, our HSOC has an advantage that multiple phonotopic maps suitable for given input data are self-organizable. The resulting three korean phonotopic maps are optimally labelled and have their own optimal preprocessing schemes, and also confirm to the conventional linguistic knowledge.

  • PDF

A System of Audio Data Analysis and Masking Personal Information Using Audio Partitioning and Artificial Intelligence API (오디오 데이터 내 개인 신상 정보 검출과 마스킹을 위한 인공지능 API의 활용 및 음성 분할 방법의 연구)

  • Kim, TaeYoung;Hong, Ji Won;Kim, Do Hee;Kim, Hyung-Jong
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.30 no.5
    • /
    • pp.895-907
    • /
    • 2020
  • With the recent increasing influence of multimedia content other than the text-based content, services that help to process information in content brings us great convenience. These services' representative features are searching and masking the sensitive data. It is not difficult to find the solutions that provide searching and masking function for text information and image. However, even though we recognize the necessity of the technology for searching and masking a part of the audio data, it is not easy to find the solution because of the difficulty of the technology. In this study, we propose web application that provides searching and masking functions for audio data using audio partitioning method. While we are achieving the research goal, we evaluated several speech to text conversion APIs to choose a proper API for our purpose and developed regular expressions for searching sensitive information. Lastly we evaluated the accuracy of the developed searching and masking feature. The contribution of this work is in design and implementation of searching and masking a sensitive information from the audio data by the various functionality proving experiments.

A perceptual study on the correlation between the meaning of Korean polysemic ending and its boundary tone (동형다의 종결어미의 의미와 경계성조의 상관성에 대한 지각연구)

  • Youngsook Yune
    • Phonetics and Speech Sciences
    • /
    • v.14 no.4
    • /
    • pp.1-10
    • /
    • 2022
  • The Korean polysemic ending '-(eu)lgeol' can has two different meanings, 'guess' and 'regret'. These are expressed by different boundary-tone types: a rising tone for guess, a falling one for regret. Therefore the sentence-final boundary-tone type is the most salient prosodic feature. However, besides tone type, the pitch difference between the final and penultimate syllables of '-(eu)lgeol' can also affect semantic discrimination. To investigate this aspect, we conducted a perception test using two sentences that were morphologically and syntactically identical. These two sentences were spoken using different boundary-tone types by a Korean native speaker. From these two sentences, the experimental stimuli were generated by artificially raising or lowering the pitch of the boundary syllable by 1Qt while fixing the pitch of the penultimate syllable and boundary-tone type. Thirty Korean native speakers participated in three levels of perceptual test, in which they were asked to mark whether the experimental sentences they listened to were perceived as guess or regret. The results revealed that regardless of boundary-tone types, the larger the pitch difference between the final and penultimate syllable in the positive direction, the more likely it is perceived as guess, and the smaller the pitch difference in the negative direction, the more likely it is perceived as regret.

Extending StarGAN-VC to Unseen Speakers Using RawNet3 Speaker Representation (RawNet3 화자 표현을 활용한 임의의 화자 간 음성 변환을 위한 StarGAN의 확장)

  • Bogyung Park;Somin Park;Hyunki Hong
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.7
    • /
    • pp.303-314
    • /
    • 2023
  • Voice conversion, a technology that allows an individual's speech data to be regenerated with the acoustic properties(tone, cadence, gender) of another, has countless applications in education, communication, and entertainment. This paper proposes an approach based on the StarGAN-VC model that generates realistic-sounding speech without requiring parallel utterances. To overcome the constraints of the existing StarGAN-VC model that utilizes one-hot vectors of original and target speaker information, this paper extracts feature vectors of target speakers using a pre-trained version of Rawnet3. This results in a latent space where voice conversion can be performed without direct speaker-to-speaker mappings, enabling an any-to-any structure. In addition to the loss terms used in the original StarGAN-VC model, Wasserstein distance is used as a loss term to ensure that generated voice segments match the acoustic properties of the target voice. Two Time-Scale Update Rule (TTUR) is also used to facilitate stable training. Experimental results show that the proposed method outperforms previous methods, including the StarGAN-VC network on which it was based.

Fiberscopic and Electromyograpic Study on Laryngeal Adjustments for Syllable-final Applosives in Korean (한국어의 음절말 내파음의 후두조절 -화이비스코프 및 근전도에 의한 관찰-)

  • Park, Hea-Suk
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.16 no.1
    • /
    • pp.53-67
    • /
    • 2005
  • It is known that Korean stop consonants in syllable-initial position are of three types : lax, aspirated and forced (or unaspirated). In syllable-final position, however, these three different types are merged to a single type with the same place of articulation, although the original three-way distinction is preserved in Korean orthographic (Hangul) system. Thus the syllable-final stops are phonetically realized as voiceless "applosives" which are characterized by the absence of oral release. The aim of the present study is to investigate the laryngeal adjustments for these syllable-final stops in various phonological conditions by using fiberscope, and, is to further investigate electromyographically the laryngeal adjustments for Korean stops both in the syllable-initial and final positions in various phonological conditions. The results can be summarized as follows : 1. In the case of syllable-initial stops, the glottal widths in each three types of the Korean stops during the articulatory closure are clearly different. And the pattern of thyroarytenoid(VOC) activity appeared to characterize the three different types of Korean stops. 2. The basic laryngeal feature of the Korean syllable-final applosives is characterized by a small degree of glottal opening which begins at or slightly after the oral closure. 3. In the case, syllable-final stop followed by the copula "ita", the syllable- final stop is pronounced as the stop consonant at the initial position of the following syllable containing the vowel[i], the underlying features of three-way distinction for the stops in the Korean orthographic(Hangul) system being manifested at the laryngeal adjustment. 4. In the case of the final applosives followed by the initial stops and fricatives, the laryngeal feature of the final applosives appears to be assimilated by that of the following consonant irrespective of the difference in the place of articulation, as far as the glottal abduction/adduction is concerned. It is clearly demonstrated in the case of syllable-initial stop that thyoarytenoid(VOC) activity is suppressed for the production of the stop consonants in question, the degree of which is slightest for the forced type and most marked for the aspirated type, while it is moderate for the lax type.

  • PDF

A Comparative Study on Using SentiWordNet for English Twitter Sentiment Analysis (영어 트위터 감성 분석을 위한 SentiWordNet 활용 기법 비교)

  • Kang, In-Su
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.23 no.4
    • /
    • pp.317-324
    • /
    • 2013
  • Twitter sentiment analysis is to classify a tweet (message) into positive and negative sentiment class. This study deals with SentiWordNet(SWN)-based twitter sentiment analysis. SWN is a sentiment dictionary in which each sense of an English word has a positive and negative sentimental strength. There has been a variety of SWN-based sentiment feature extraction methods which typically first determine the sentiment orientation (SO) of a term in a document and then decide SO of the document from such terms' SO values. For example, for SO of a term, some calculated the maximum or average of sentiment scores of its senses, and others computed the average of the difference of positive and negative sentiment scores. For SO of a document, many researchers employ the maximum or average of terms' SO values. In addition, the above procedure may be applied to the whole set (adjective, adverb, noun, and verb) of parts-of-speech or its subset. This work provides a comparative study on SWN-based sentiment feature extraction schemes with performance evaluation on a well-known twitter dataset.

Development of a Web-based Presentation Attitude Correction Program Centered on Analyzing Facial Features of Videos through Coordinate Calculation (좌표계산을 통해 동영상의 안면 특징점 분석을 중심으로 한 웹 기반 발표 태도 교정 프로그램 개발)

  • Kwon, Kihyeon;An, Suho;Park, Chan Jung
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.2
    • /
    • pp.10-21
    • /
    • 2022
  • In order to improve formal presentation attitudes such as presentation of job interviews and presentation of project results at the company, there are few automated methods other than observation by colleagues or professors. In previous studies, it was reported that the speaker's stable speech and gaze processing affect the delivery power in the presentation. Also, there are studies that show that proper feedback on one's presentation has the effect of increasing the presenter's ability to present. In this paper, considering the positive aspects of correction, we developed a program that intelligently corrects the wrong presentation habits and attitudes of college students through facial analysis of videos and analyzed the proposed program's performance. The proposed program was developed through web-based verification of the use of redundant words and facial recognition and textualization of the presentation contents. To this end, an artificial intelligence model for classification was developed, and after extracting the video object, facial feature points were recognized based on the coordinates. Then, using 4000 facial data, the performance of the algorithm in this paper was compared and analyzed with the case of facial recognition using a Teachable Machine. Use the program to help presenters by correcting their presentation attitude.