• Title/Summary/Keyword: Phonetics

Search Result 948, Processing Time 0.019 seconds

Performance of music section detection in broadcast drama contents using independent component analysis and deep neural networks (ICA와 DNN을 이용한 방송 드라마 콘텐츠에서 음악구간 검출 성능)

  • Heo, Woon-Haeng;Jang, Byeong-Yong;Jo, Hyeon-Ho;Kim, Jung-Hyun;Kwon, Oh-Wook
    • Phonetics and Speech Sciences
    • /
    • v.10 no.3
    • /
    • pp.19-29
    • /
    • 2018
  • We propose to use independent component analysis (ICA) and deep neural network (DNN) to detect music sections in broadcast drama contents. Drama contents mainly comprise silence, noise, speech, music, and mixed (speech+music) sections. The silence section is detected by signal activity detection. To detect the music section, we train noise, speech, music, and mixed models with DNN. In computer experiments, we used the MUSAN corpus for training the acoustic model, and conducted an experiment using 3 hours' worth of Korean drama contents. As the mixed section includes music signals, it was regarded as a music section. The segmentation error rate (SER) of music section detection was observed to be 19.0%. In addition, when stereo mixed signals were separated into music signals using ICA, the SER was reduced to 11.8%.

A perceptual and acoustical study of /ㅅ/ in children's speech (아동이 산출한 치조마찰음 /ㅅ/에 대한 청지각적·음향학적 연구)

  • Kim, Jiyoun;Seong, Cheoljae
    • Phonetics and Speech Sciences
    • /
    • v.10 no.3
    • /
    • pp.41-48
    • /
    • 2018
  • This study examined the acoustic characteristics of Korean alveolar fricatives of normal children. Developing children aged 3 and 7, typically produced 2 types of nonsense syllables containing alveolar fricative /sV/ and /VsV/ sequences where V was any one of three corner vowels (/i, a, and u/). Stimuli containing the speech materials used in a production experiment were presented randomly to 12 speech language pathologists (SLPs) for a perception test. The SLPs responded by selecting one of seven alternative sounds. Acoustic measures such as duration of frication noise, normalized intensity, skewness, and center of gravity were examined. There was significant difference in acoustic measures when comparing vowels. Comparison of syllable structures indicated statistically significant differences in duration of frication noise and normalized intensity. Acoustic parameters could account for the perceptual data. Relating the acoustic and perception data by means of logistic regression suggests that duration of frication noise and normalized intensity are the primary cues to perceiving Korean fricatives.

Corpus-based evaluation of French text normalization (코퍼스 기반 프랑스어 텍스트 정규화 평가)

  • Kim, Sunhee
    • Phonetics and Speech Sciences
    • /
    • v.10 no.3
    • /
    • pp.31-39
    • /
    • 2018
  • This paper aims to present a taxonomy of non-standard words (NSW) for developing a French text normalization system and to propose a method for evaluating this system based on a corpus. The proposed taxonomy of French NSWs consists of 13 categories, including 2 types of letter-based categories and 9 types of number-based categories. In order to evaluate the text normalization system, a representative test set including NSWs from various text domains, such as news, literature, non-fiction, social-networking services (SNSs), and transcriptions, is constructed, and an evaluation equation is proposed reflecting the distribution of the NSW categories of the target domain to which the system is applied. The error rate of the test set is 1.64%, while the error rate of the whole corpus is 2.08%, reflecting the NSW distribution in the corpus. The results show that the literature and SNS domains are assessed as having higher error rates compared to the test set.

An Acoustic Study for Improving English Communicative Competence of Elementary School Students. (초등학생들의 영어 의사소통능력 신장을 위한 음향음성학적 분석)

  • Yang, Hyung-Wook
    • Proceedings of the KSPS conference
    • /
    • 2004.05a
    • /
    • pp.261-265
    • /
    • 2004
  • The purpose of this paper is to improve English communicative competence of elementary school students through an acoustic study For this purpose, this study investigates various postlexical phenomena which can be applied to utterence contents in elementary school English book and analyzes the application of postlexical phenomena through the spectrogram when native speakers and elementary school students speak English. The speech materials were seven sentences which contained various postlexical phenomena. This leads to the conclusion that knowing and pronouncing postlexical phenomena of English is needed for improving English communicative competence successfully.

  • PDF

Artificial Neural Network Prediction of Midsagittal Pharynx Shape from Ultrasound Images for English Speech (영어 발성에서 초음파 영상 정보를 이용한 인공신경망 기반의 인강부의 추정과 평가 방법에 대한 연구)

  • Nam, Ho-Sung
    • Phonetics and Speech Sciences
    • /
    • v.3 no.2
    • /
    • pp.23-28
    • /
    • 2011
  • Electromagnetometers (EMA) have been widely used in articulatory studies as their temporal resolution can capture most speech activities and the fleshpoint information allows one to readily quantify and analyze tongue shape. However, the drawback is that the data lacks details of activity in the pharyngeal region. Several studies have attempted to estimate the unknown pharyngeal shape of the tongue, but few studies are based on unimodal data containing both front and back regions of the tongue at the same time. We use Stone's ball bearing method to obtain fleshpoint data as well as tongue shape. We further introduce a novel way of connecting balls and attaching them onto the tongue to ensure accurate tracking. An Artificial Neural Network is applied to build a map between observable flesh-points, unknown tongue shape, and pharyngeal region and is optimized to efficiently address nonlinearity.

  • PDF

Coordinations of Articulators in Korean Place Assimilation

  • Son, Min-Jung
    • Phonetics and Speech Sciences
    • /
    • v.3 no.2
    • /
    • pp.29-35
    • /
    • 2011
  • This paper examines several articulatory properties of /k/, known as a trigger of place assimilation as well as the object of post-obstruent tensing (/tk/), in comparison to non-assimilating controls (/kk/ and /kt/). Using EMMA, tongue body articulation in the place assimilation context robustly shows greater spatio-temporal articulation and lower jaw position. Results showed several characteristics. Firstly, constriction duration of the tongue body gesture in C2 of the assimilation context (/tk/) was longer than non-assimilating controls (/kk/ and /kt/). Secondly, constriction maxima also demonstrated greater constriction in the /tk/ sequences than in the control /kk/, but similar values with the control /kt/. In particular, results showed a significant relationship between the two variables - the longer the constriction duration, the greater the constriction degree. Lastly, jaw height was lower for the assimilating context /tk/, intermediate for the control /kk/, and higher for the control /kt/. Results suggest that speakers have lexical knowledge of place assimilation, producing a greater tongue body gesture in the spatio-temporal domains with lower jaw height as an indication of anticipating reduction of C1 in /tk/ sequences.

  • PDF

Speaker Identification Using an Ensemble of Feature Enhancement Methods (특징 강화 방법의 앙상블을 이용한 화자 식별)

  • Yang, IL-Ho;Kim, Min-Seok;So, Byung-Min;Kim, Myung-Jae;Yu, Ha-Jin
    • Phonetics and Speech Sciences
    • /
    • v.3 no.2
    • /
    • pp.71-78
    • /
    • 2011
  • In this paper, we propose an approach which constructs classifier ensembles of various channel compensation and feature enhancement methods. CMN and CMVN are used as channel compensation methods. PCA, kernel PCA, greedy kernel PCA, and kernel multimodal discriminant analysis are used as feature enhancement methods. The proposed ensemble system is constructed with the combination of 15 classifiers which include three channel compensation methods (including 'without compensation') and five feature enhancement methods (including 'without enhancement'). Experimental results show that the proposed ensemble system gives highest average speaker identification rate in various environments (channels, noises, and sessions).

  • PDF

Design and Implementation of a Usability Testing Tool for User-oriented Design of Command-and-Control Voice User Interfaces (명령 제어 음성 인터페이스 사용자 중심 설계를 위한 사용성 평가도구의 설계 및 구현)

  • Lee, Myeong-Ji;Hong, Ki-Hyung
    • Phonetics and Speech Sciences
    • /
    • v.3 no.2
    • /
    • pp.79-87
    • /
    • 2011
  • Recently, usability has become very important in voice user interface systems. In this paper, we have designed and implemented a wizard-of-oz (WOZ) usability testing tool for command-and-control voice user interfaces. We have proposed the VUIDML (Voice User Interface Design Markup Language) to design the usability test scenario of command-and-control voice interfaces in the early design stages. For highly satisfactory voice user interfaces, we have to select highly preferred voice commands and prompts. In VUIDML, we can specify possible prompt candidates. The WOZ usability testing tool can also be used to collect user-preferred voice commands and feedback from real users.

  • PDF

A study on the Suprasegmental Parameters Exerting an Effect on the Judgment of Goodness or Badness on Korean-spoken English (한국인 영어 발음의 좋음과 나쁨 인지 평가에 영향을 미치는 초분절 매개변수 연구)

  • Kang, Seok-Han;Rhee, Seok-Chae
    • Phonetics and Speech Sciences
    • /
    • v.3 no.2
    • /
    • pp.3-10
    • /
    • 2011
  • This study investigates the role of suprasegmental features with respect to the intelligibility of Korean-spoken English judged by Korean and English raters as being good or bad. It has been hypothesized that Korean raters would have different evaluations from English native raters and that the effect may vary depending on the types of suprasegmental factors. Four Korean and four English native raters, respectively, took part in the evaluation of 14 Korean subjects' English speaking. The subjects read a given paragraph. The results show that the evaluation for 'intelligibility' is different for the two groups and that the difference comes from their perception of L2 English suprasegmentals.

  • PDF

VERTICAL DIMENSION : A LITERATURE REVIEW (수직고경(VERTICAL DEMINSION)의 회복에 대한 문헌적 고찰)

  • Hwang, Doo-Yeon;Yang, Ja-Ho
    • The Journal of Korean Academy of Prosthodontics
    • /
    • v.35 no.1
    • /
    • pp.211-220
    • /
    • 1997
  • This article describes verticsal dimension in its histologic and clinical aspect. Determination of correct vertical dimension of occlusion is one of the most important steps in prosthodontic rehabilitation. It is considered essential for improvement of facial esthetics and stomatognatic functions. Many techniques have been sued for measurement of the vertical dimension in dentulous and edentulous patients : pre-extraction record, physiologic rest position, swallowing, phonetics, esthetics, etc. But, there is no universally accepted or completely accurate method. Though a great deal of energy has been spent trying to find the exact position of the mandible, there is an controversial aspect of vetical dimension.

  • PDF