• Title/Summary/Keyword: Voice and Text Analysis

검색결과 68건 처리시간 0.024초

발성장애 평가 시 /a/ 모음연장발성 및 문장검사의 켑스트럼 분석 비교 (Comparison of Vowel and Text-Based Cepstral Analysis in Dysphonia Evaluation)

  • 김태환;최정임;이상혁;진성민
    • 대한후두음성언어의학회지
    • /
    • 제26권2호
    • /
    • pp.117-121
    • /
    • 2015
  • Background : Cepstral analysis which is obtained from Fourier transformation of spectrum has been known to be effective indicator to analyze the voice disorder. To evaluate the voice disorder, phonation of sustained vowel /a/ sound or continuous speech have been used but the former was limited to capture hoarseness properly. This study is aimed to compare the effectiveness in analysis of cepstrum between the sustained vowel /a/ sound and continuous speech. Methods : From March 2012 to December 2014, total 72 patients was enrolled in this study, including 24 unilateral vocal cord palsy, vocal nodule and vocal polyp patients, respectively. The entire patient evaluated their voice quality by VHI (Voice Handicap Index) before and after treatment. Phonation of sustained vowel /a/ sample and continuous speech using the first sentence of autumn paragraph was subjected by cepstral analysis and compare the pre-treatment group and post-treatment group. Results : The measured values of pre and post treatment in CPP-a (cepstral peak prominence in /a/ vowel sound) was 13.80, 13.91 in vocal cord palsy, 16.62, 17.99 in vocal cord nodule, 14.19, 18.50 in vocal cord polyp respectively. Values of CPP-s (cepstral peak prominence in text-based speech) in pre and post treatment was 11.11, 12.09 in vocal cord palsy, 12.11, 14.09 in vocal cord nodule, 12.63, 14.17 in vocal cord polyp. All 72 patients showed subjective improvement in VHI after treatment. CPP-a showed statistical improvement only in vocal polyp group, but CPP-s showed statistical improvement in all three groups (p<0.05). Conclusion : In analysis of cepstrum, text-based analysis is more representative in voice disorder than vowel sound speech. So when the acoustic analysis of voice by cepstrum, both phonation of sustained vowel /a/ sound and text based speech should be performed to obtain more accurate result.

  • PDF

머신러닝 기법을 이용한 한국어 보이스피싱 텍스트 분류 성능 분석 (Korean Voice Phishing Text Classification Performance Analysis Using Machine Learning Techniques)

  • 무사부부수구밀란두키스;진상윤;장대호;박동주
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2021년도 추계학술발표대회
    • /
    • pp.297-299
    • /
    • 2021
  • Text classification is one of the popular tasks in Natural Language Processing (NLP) used to classify text or document applications such as sentiment analysis and email filtering. Nowadays, state-of-the-art (SOTA) Machine Learning (ML) and Deep Learning (DL) algorithms are the core engine used to perform these classification tasks with high accuracy, and they show satisfying results. This paper conducts a benchmarking performance's analysis of multiple SOTA algorithms on the first known labeled Korean voice phishing dataset called KorCCVi. Experimental results reveal performed on a test set of 366 samples reveal which algorithm performs the best considering the training time and metrics such as accuracy and F1 score.

The Impact of Transforming Unstructured Data into Structured Data on a Churn Prediction Model for Loan Customers

  • Jung, Hoon;Lee, Bong Gyou
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제14권12호
    • /
    • pp.4706-4724
    • /
    • 2020
  • With various structured data, such as the company size, loan balance, and savings accounts, the voice of customer (VOC), which is text data containing contact history and counseling details was analyzed in this study. To analyze unstructured data, the term frequency-inverse document frequency (TF-IDF) analysis, semantic network analysis, sentiment analysis, and a convolutional neural network (CNN) were implemented. A performance comparison of the models revealed that the predictive model using the CNN provided the best performance with regard to predictive power, followed by the model using the TF-IDF, and then the model using semantic network analysis. In particular, a character-level CNN and a word-level CNN were developed separately, and the character-level CNN exhibited better performance, according to an analysis for the Korean language. Moreover, a systematic selection model for optimal text mining techniques was proposed, suggesting which analytical technique is appropriate for analyzing text data depending on the context. This study also provides evidence that the results of previous studies, indicating that individual customers leave when their loyalty and switching cost are low, are also applicable to corporate customers and suggests that VOC data indicating customers' needs are very effective for predicting their behavior.

텍스트 속 자신의 표현: 영어 편지글에 나타난 수사 형태와 작문 활동에 관한 탐색 (Written Voice in the Text: Investigating Rhetorical Patterns and Practices for English Letter Writing)

  • 이영화
    • 한국콘텐츠학회논문지
    • /
    • 제20권3호
    • /
    • pp.432-439
    • /
    • 2020
  • 본 연구는 영어 편지 글에 나타난 자신의 표현, 수사 형태, 그리고 작문 활동을 중심으로 한국 대학생의 서면 텍스트의 특성을 살펴보는 것을 목적으로 한다. 자료로는 학생들의 영어 취업지원서를 포함하였으며, 분석을 위해 '목적-의지' 모델을 채택하였다. 연구 결과, 학생들은 재설정된 상황에서 글 쓰는 이로서의 자신을 표현하기 위해 독특한 전략을 사용하였다. 취업 지원을 위한 편지 속 학생들의 표현 방법은 매우 다양하였고, 어느 누구도 날씨를 언급하는 한국식 편지 쓰기 방식을 채택하지 않았다. 수사 형태는 정형화된 형식에서 벗어나 다양성과 통합된 모습을 보여주었다. 작문 활동을 통해 학생들은 글 쓰는 이로서의 고유한 내적 가치를 보여주었으며, 이는 곧 학생들의 작문 결과가 교수자의 강의 내용과 동일한 모습으로 나타나지 않는다는 것을 의미한다. 이러한 결과는 학습은 특정 담화 공동체 내에서의 상황 활동이라는 사회 문화 이론을 뒷받침한다. 그러므로 영작문 교수자는 학생들의 삶과 학습 경험이 텍스트 속 정체성과 작문 활동에 영향을 미친다는 사실을 인지하고 지도해야 한다.

음성합성시스템을 위한 음색제어규칙 연구 (A Study on Voice Color Control Rules for Speech Synthesis System)

  • 김진영;엄기완
    • 음성과학
    • /
    • 제2권
    • /
    • pp.25-44
    • /
    • 1997
  • When listening the various speech synthesis systems developed and being used in our country, we find that though the quality of these systems has improved, they lack naturalness. Moreover, since the voice color of these systems are limited to only one recorded speech DB, it is necessary to record another speech DB to create different voice colors. 'Voice Color' is an abstract concept that characterizes voice personality. So speech synthesis systems need a voice color control function to create various voices. The aim of this study is to examine several factors of voice color control rules for the text-to-speech system which makes natural and various voice types for the sounding of synthetic speech. In order to find such rules from natural speech, glottal source parameters and frequency characteristics of the vocal tract for several voice colors have been studied. In this paper voice colors were catalogued as: deep, sonorous, thick, soft, harsh, high tone, shrill, and weak. For the voice source model, the LF-model was used and for the frequency characteristics of vocal tract, the formant frequencies, bandwidths, and amplitudes were used. These acoustic parameters were tested through multiple regression analysis to achieve the general relation between these parameters and voice colors.

  • PDF

TTS를 이용한 매장 음악 방송 서비스 시스템 구현 (Implementation of Music Broadcasting Service System in the Shopping Center Using Text-To-Speech Technology)

  • 장문수;강선미
    • 음성과학
    • /
    • 제14권4호
    • /
    • pp.169-178
    • /
    • 2007
  • This thesis describes the development of a service system for small-sized shops which support not only music broadcasting, but editing and generating voice announcement using the TTS(Text-To-Speech) technology. The system has been developed based on web environments with an easy access whenever and wherever it is needed. The system is able to control the sound using silverlight media player based on the ASP .NET 2.0 technology without any additional application software. Use of the Ajax control allows for multiple users to get the maximum load when needed. TTS is built in the server side so that the service can be provided without user's computer. Due to convenience and usefulness of the system, the business sector can provide better service to many shops. Further additional functions such as statistical analysis will undoubtedly help shop management provide desirable services.

  • PDF

서로 다른 챗봇 유형이 한국 EFL 학습자의 말하기능력 및 학습자인식에 미치는 영향 (Effects of Different Types of Chatbots on EFL Learners' Speaking Competence and Learner Perception)

  • 김나영
    • 비교문화연구
    • /
    • 제48권
    • /
    • pp.223-252
    • /
    • 2017
  • 본 연구의 목적은 서로 다른 두 유형의 챗봇(음성기반 챗봇 및 문자기반 챗봇)이 한국 EFL 학습자의 말하기 능력 및 학습자 인식에 미치는 영향을 파악하는데 있다. 실험 참가자는 총 80명으로, 한국에 있는 한 대학교의 신입생들이었으며, 모두 교양 영어 말하기 수업을 듣는 학생들이었다. 참가자들은 무작위 하게 두 실험집단으로 나뉘어 16주 동안 서로 다른 두 유형의 챗봇과 10번의 채팅에 참여하였다. 실험 전후 참가자의 말하기 능력에 변화가 있는지를 알아보기 위하여 사전사후 말하기 시험을 진행하였고, 챗봇을 이용한 영어학습에 대해 인식변화가 있는지를 살펴보기 위해 사전사후 설문조사를 실시하였다. 말하기시험결과, 챗봇을 이용한 한국 EFL 학습자들의 의사소통능력은 유의미하게 향상되었고, 그 중 문자기반 챗봇이 의사소통능력 향상에 더욱 도움이 되는 것으로 나타났다. 설문조사결과, 챗봇기반 영어학습에 대한 학습자들의 인식은 긍정적으로 변화하였고, 그 중 음성기반 챗봇에 대한 인식이 좀 더 호의적으로 바뀐 것으로 조사됐다. 본 연구는 EFL 상황에서 챗봇기반 영어학습에 대한 새로운 가능성을 모색하고, 효과적인 챗봇활용을 위한 제언을 도출하고 있다.

고객의 소리(VOC) 데이터를 활용한 서비스 처리 시간 예측방법 (A Method of Predicting Service Time Based on Voice of Customer Data)

  • 김정훈;권오병
    • 한국IT서비스학회지
    • /
    • 제15권1호
    • /
    • pp.197-210
    • /
    • 2016
  • With the advent of text analytics, VOC (Voice of Customer) data become an important resource which provides the managers and marketing practitioners with consumer's veiled opinion and requirements. In other words, making relevant use of VOC data potentially improves the customer responsiveness and satisfaction, each of which eventually improves business performance. However, unstructured data set such as customers' complaints in VOC data have seldom used in marketing practices such as predicting service time as an index of service quality. Because the VOC data which contains unstructured data is too complicated form. Also that needs convert unstructured data from structure data which difficult process. Hence, this study aims to propose a prediction model to improve the estimation accuracy of the level of customer satisfaction by combining unstructured from textmining with structured data features in VOC. Also the relationship between the unstructured, structured data and service processing time through the regression analysis. Text mining techniques, sentiment analysis, keyword extraction, classification algorithms, decision tree and multiple regression are considered and compared. For the experiment, we used actual VOC data in a company.

Voice Similarities between Sisters

  • Ko, Do-Heung
    • 음성과학
    • /
    • 제8권3호
    • /
    • pp.43-50
    • /
    • 2001
  • This paper deals with voice similarities between sisters who are supposed to have common physiological characteristics from a single biological mother. Nine pairs of sisters who are believed to have similar voices participated in this experiment. The speech samples obtained from one pair of sisters were eliminated in the analysis because their perceptual score was relatively low. The words were measured in both isolation and context, and the subjects were asked to read the text five times with about three seconds of interval between readings. Recordings were made at natural speed in a quiet room. The data were analyzed in pitch and formant frequencies using CSL (Computerized Speech Lab) and PCQuirer. It was found that data of the initial vowels are much more similar and homogeneous than those of vowels in other positions. The acoustic data showed that voice similarities are strikingly high in both pitch and formant frequencies. It is assumed that statistical data obtained from this experiment can be used as a guideline for modelling speaker identification and speaker verification.

  • PDF

Speech Emotion Recognition in People at High Risk of Dementia

  • Dongseon Kim;Bongwon Yi;Yugwon Won
    • 대한치매학회지
    • /
    • 제23권3호
    • /
    • pp.146-160
    • /
    • 2024
  • Background and Purpose: The emotions of people at various stages of dementia need to be effectively utilized for prevention, early intervention, and care planning. With technology available for understanding and addressing the emotional needs of people, this study aims to develop speech emotion recognition (SER) technology to classify emotions for people at high risk of dementia. Methods: Speech samples from people at high risk of dementia were categorized into distinct emotions via human auditory assessment, the outcomes of which were annotated for guided deep-learning method. The architecture incorporated convolutional neural network, long short-term memory, attention layers, and Wav2Vec2, a novel feature extractor to develop automated speech-emotion recognition. Results: Twenty-seven kinds of Emotions were found in the speech of the participants. These emotions were grouped into 6 detailed emotions: happiness, interest, sadness, frustration, anger, and neutrality, and further into 3 basic emotions: positive, negative, and neutral. To improve algorithmic performance, multiple learning approaches were applied using different data sources-voice and text-and varying the number of emotions. Ultimately, a 2-stage algorithm-initial text-based classification followed by voice-based analysis-achieved the highest accuracy, reaching 70%. Conclusions: The diverse emotions identified in this study were attributed to the characteristics of the participants and the method of data collection. The speech of people at high risk of dementia to companion robots also explains the relatively low performance of the SER algorithm. Accordingly, this study suggests the systematic and comprehensive construction of a dataset from people with dementia.