• Title/Summary/Keyword: voice quality features

Search Result 43, Processing Time 0.031 seconds

Analyzing the element of emotion recognition from speech (음성으로부터 감성인식 요소분석)

  • 심귀보;박창현
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.11 no.6
    • /
    • pp.510-515
    • /
    • 2001
  • Generally, there are (1)Words for conversation (2)Tone (3)Pitch (4)Formant frequency (5)Speech speed, etc as the element for emotional recognition from speech signal. For human being, it is natural that the tone, vice quality, speed words are easier elements rather than frequency to perceive other s feeling. Therefore, the former things are important elements fro classifying feelings. And, previous methods have mainly used the former thins but using formant is good for implementing as machine. Thus. our final goal of this research is to implement an emotional recognition system based on pitch, formant, speech speed, etc. from speech signal. In this paper, as first stage we foun specific features of feeling angry from his words when a man got angry.

  • PDF

Monitoring System with PLC I/O for Car Parking Lot (Car Parking Lot 모니터링 시스템)

  • Lee, Seong-Jae;Kim, Jae-Yang
    • Proceedings of the Korean Institute of Electrical and Electronic Material Engineers Conference
    • /
    • 2007.06a
    • /
    • pp.511-512
    • /
    • 2007
  • The monitoring system has won acceptance as a premium mark that identifies the highest standard of product quality in advanced industry. The TOP features with multi-I/O ports and VGA & RCA TV-out ports supporting mirroring & multiple dual-display modes by windows 0/5. With the choice of versatile stands, panel mount, or VESA wall-mount swing arm and connecting to modem. Wireless keyboard, Customer Display and Card Reader, is your idea Panel system for the application of TOP(Touch Operation Pannel), KIOSK, or Office / Factory Automation. TOP is the hardware and software product that transacts all kind of functions for advanced technology equipment to button, switch, voice and graph etc so that let consumer use easily Industrial HMI System Touch Panel. System characteristics: Easy of use and flexibility to the user, Present a high value solution and advanced function for many Application, Factory Automation, Office Automation, Building Automation System, Information Service System, etc. Analog Touch - 2MB Flash Memory for Saving Screen Data - RS-232C/422 Serial Port - Multi Language Support.

  • PDF

Histopathologic and Physiologic Features of the Aging Larynx (노인성 후두의 조직병리학적, 생리학적 특성)

  • Park, Il-Seok
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.25 no.1
    • /
    • pp.20-23
    • /
    • 2014
  • Age-related changes in larynx can have a direct impact on voice quality and general comfort level. Observations of vocal aging have spanned perceptual, acoustic, aerodynamic, physical, electromyographic (EMG) and histological levels. Evidence of differential vocal aging in relation to gender and physical condition has been reported. Perceptual, acoustic, aerodynamic, kinematic, EMG and histological data document age-related changes in laryngeal structure and function with advancing age. These changes contribute to a functional age-related impact of vocal hypofunction or compensatory hyperfunction. This review will focus on the current understanding of the clinical and cellular changes in the larynx that lead to presbyphonia.

  • PDF

Recognition of Overlapped Sound and Influence Analysis Based on Wideband Spectrogram and Deep Neural Networks (광역 스펙트로그램과 심층신경망에 기반한 중첩된 소리의 인식과 영향 분석)

  • Kim, Young Eon;Park, Gooman
    • Journal of Broadcast Engineering
    • /
    • v.23 no.3
    • /
    • pp.421-430
    • /
    • 2018
  • Many voice recognition systems use methods such as MFCC, HMM to acknowledge human voice. This recognition method is designed to analyze only a targeted sound which normally appears between a human and a device one. However, the recognition capability is limited when there is a group sound formed with diversity in wider frequency range such as dog barking and indoor sounds. The frequency of overlapped sound resides in a wide range, up to 20KHz, which is higher than a voice. This paper proposes the new recognition method which provides wider frequency range by conjugating the Wideband Sound Spectrogram and the Keras Sequential Model based on DNN. The wideband sound spectrogram is adopted to analyze and verify diverse sounds from wide frequency range as it is designed to extract features and also classify as explained. The KSM is employed for the pattern recognition using extracted features from the WSS to improve sound recognition quality. The experiment verified that the proposed WSS and KSM excellently classified the targeted sound among noisy environment; overlapped sounds such as dog barking and indoor sounds. Furthermore, the paper shows a stage by stage analyzation and comparison of the factors' influences on the recognition and its characteristics according to various levels of noise.

Characteristics of respiration and phonation depending on smoking or non smoking by practical musicology students and general male students (실용음악전공학생과 일반남학생의 흡연여부에 따른 호흡과 발성 특성 비교)

  • Kim, Eunhye;Choi, Hong-Shik;Lim, Seong-Eun;Choi, Yaelin
    • Phonetics and Speech Sciences
    • /
    • v.6 no.3
    • /
    • pp.49-56
    • /
    • 2014
  • This research compared the features of respiration and phonation between practical musicology students and general male students, according to their smoking status. Participants of this research are 15 practical musicology male students attending ${\bigcirc}{\bigcirc}$ university and 16 general ${\bigcirc}{\bigcirc}{\bigcirc}$ university students. The participants, both non-smokers and smokers with 5-years of smoking history have no history of voice disease in any case and have normal cognitive functions. The results indicated that, first, there is not a notable difference in the respiratory activity status(FVC, FEV1, FEV1/FVC), regardless of major and smoking status. In MPT, even though there is no significant difference in accordance with their majors, considering smoking status, the smoker group was shorter than non-smoker group significant difference statistically (p<.01). Second, the divisions of participants' major did not show significant difference in Fo, jitter, shimmer, and NHR in the vowel prolongation task. However, the smoker group showed a significantly higher degree of jitter and shimmer than the non-smoker group (p<.05) as Fo and NHR shows no difference. In the case of VRP, maximum frequency and frequency range of the practical group are significantly higher than normal group statistically (p<.001). Moreover, although the difference of the minimum frequency shown at the statistic is not significant, practical group showed a higher tendency of frequency than normal group (p=.051). In conclusion, even though there is no difference in respiratory activity between the smoker group and non-smoker group, the MPT of the smoker group is shorter than that of non-smoker group. In addition, the smoker group showed a higher degree of jitter and shimmer than the non-smoker group. MPT is related to the valve action of vocal fold that passes through the glottis. Thus, it is interpreted that the smoker group has a lower quality of voice and valve action of the vocal fold. Also, the practical group has a higher degree of maximum frequency and frequency range than the normal group. This research can function as basic data for vocal characteristics for the majors in relation to the voice-specializing.

A Study on the Syntagma & Paradigm by Repetition, Variation and Contrast in Ads

  • Choi, Seong-hoon
    • Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology
    • /
    • v.7 no.9
    • /
    • pp.1-12
    • /
    • 2017
  • This study is the academic work to explore the potential meanings of print advertisements. Linguistic features such as repetition, variation, contrast and phonological structure in the verbal texts of ads can give rise to shades-of-meaning or slight variations in advertising. The language of advertising is not only language in words. It is also a language in images, colors, and pictures. Pictures and words combine to form the advertisement's visual text.. While the words are very important in delivering the sales message, the visual text cannot be ignored in advertisements. Forming part of the visual text is the paralanguage of the ad. Paralanguage is the meaningful behaviour accompanying language, such as voice quality, gestures, facial expressions and touch in speech, and choice of typeface and letter sizes in writing. Foregrounding is the throwing into relief of the linguistic sign against the background of the norms of ordinary language. This paper focuses its discussion on the advertisements within the framework of the paradigmatic and the syntagmatic relationship. The sources of ads have been confined to Malboro. The ads were reselected based on purposive sampling methods.

One-shot multi-speaker text-to-speech using RawNet3 speaker representation (RawNet3를 통해 추출한 화자 특성 기반 원샷 다화자 음성합성 시스템)

  • Sohee Han;Jisub Um;Hoirin Kim
    • Phonetics and Speech Sciences
    • /
    • v.16 no.1
    • /
    • pp.67-76
    • /
    • 2024
  • Recent advances in text-to-speech (TTS) technology have significantly improved the quality of synthesized speech, reaching a level where it can closely imitate natural human speech. Especially, TTS models offering various voice characteristics and personalized speech, are widely utilized in fields such as artificial intelligence (AI) tutors, advertising, and video dubbing. Accordingly, in this paper, we propose a one-shot multi-speaker TTS system that can ensure acoustic diversity and synthesize personalized voice by generating speech using unseen target speakers' utterances. The proposed model integrates a speaker encoder into a TTS model consisting of the FastSpeech2 acoustic model and the HiFi-GAN vocoder. The speaker encoder, based on the pre-trained RawNet3, extracts speaker-specific voice features. Furthermore, the proposed approach not only includes an English one-shot multi-speaker TTS but also introduces a Korean one-shot multi-speaker TTS. We evaluate naturalness and speaker similarity of the generated speech using objective and subjective metrics. In the subjective evaluation, the proposed Korean one-shot multi-speaker TTS obtained naturalness mean opinion score (NMOS) of 3.36 and similarity MOS (SMOS) of 3.16. The objective evaluation of the proposed English and Korean one-shot multi-speaker TTS showed a prediction MOS (P-MOS) of 2.54 and 3.74, respectively. These results indicate that the performance of our proposed model is improved over the baseline models in terms of both naturalness and speaker similarity.

Normative-Legal and Information Security of Socio-Political Processes in Ukraine: a Comparative Aspect

  • Goshovska, Valentyna;Danylenko, Lydiia;Chukhrai, Ihor;Chukhrai, Nataliia;Kononenko, Pavlo
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.10
    • /
    • pp.57-66
    • /
    • 2022
  • The aim of the article is to investigate socio-political processes in Ukraine on the basis of institutional and behavioral approaches, in particular their regulatory and informational support. Methodology. To determine the nature and content of sociopolitical processes, the following approaches have been used: 1. Institutional approach in order to analyze the development of Ukraine's political institutions. 2. The behavioral approach has been used for the analysis of socio-political processes in Ukraine in the context of political behavior of citizens, their political activity which forms the political culture of the country. Results. The general features of the socio-political situation in Ukraine are as follows: the formed model of government, which can be conditionally described as "presidential"; public demand for new leaders remains at a high level; the society has no common vision of further development; significant tendency of reduction of real incomes of a significant part of the society and strengthening of fiscal pressure on businessmen will get a public response after some time. Increasing levels of voice, accountability, efficiency of governance and the quality of the regulatory environment indicate a slow change in the political system, which will have a positive impact on public sentiment in the future. At the same time, there has been little change in the quality of Ukraine's institutions to ensure political stability, the rule of law and control of corruption. There are no cardinal changes in the development of the institution of property rights, protection of intellectual rights, changes in the sphere of ethics and control of corruption. Thus, Ukraine's political institutions have not been able to bring about any change in the social-political processes. Accordingly, an average level of trust and confidence of citizens in political institutions and negative public sentiment regarding their perception and future change can be traced in Ukraine.

Speech Animation Synthesis based on a Korean Co-articulation Model (한국어 동시조음 모델에 기반한 스피치 애니메이션 생성)

  • Jang, Minjung;Jung, Sunjin;Noh, Junyong
    • Journal of the Korea Computer Graphics Society
    • /
    • v.26 no.3
    • /
    • pp.49-59
    • /
    • 2020
  • In this paper, we propose a speech animation synthesis specialized in Korean through a rule-based co-articulation model. Speech animation has been widely used in the cultural industry, such as movies, animations, and games that require natural and realistic motion. Because the technique for audio driven speech animation has been mainly developed for English, however, the animation results for domestic content are often visually very unnatural. For example, dubbing of a voice actor is played with no mouth motion at all or with an unsynchronized looping of simple mouth shapes at best. Although there are language-independent speech animation models, which are not specialized in Korean, they are yet to ensure the quality to be utilized in a domestic content production. Therefore, we propose a natural speech animation synthesis method that reflects the linguistic characteristics of Korean driven by an input audio and text. Reflecting the features that vowels mostly determine the mouth shape in Korean, a coarticulation model separating lips and the tongue has been defined to solve the previous problem of lip distortion and occasional missing of some phoneme characteristics. Our model also reflects the differences in prosodic features for improved dynamics in speech animation. Through user studies, we verify that the proposed model can synthesize natural speech animation.

Design and Implementation of Smart Device Application for Instructional Analysis (스마트 디바이스 기반 수업분석 프로그램 설계 및 구현 -한국어 특성 반영과 교사활용도 증진을 위한 UI설계를 적용하여-)

  • Kang, Doo Bong;Jeong, Ju Hun;Kim, Young Hwan
    • The Journal of Korean Association of Computer Education
    • /
    • v.18 no.4
    • /
    • pp.31-40
    • /
    • 2015
  • The objective of this study is to develop and implement a smart device based instructional analysis application to enhance the efficiency of teaching in class. The main design features for this application are as follows: first, User Interface(UI) has been simplified to provide teachers a clear and easy-to-understand way to utilize the application. Second, the characteristics of Korean language were considered, such as sentence structure. Third, multi-aspect analysis is possible through adopting three analysis types - Flanders' interaction analysis, Tuckman's analysis, Mcgraw's concentration of instruction analysis. The practical instructional analysis application has been developed through this study, and this user-oriented application will be able to help teachers improve the quality of teaching in class. Also, this study can be a starting point for further researches on design principles of instructional analysis, especially with the recent technology and theories, such as a voice-recognition system, an edutainment applied instruction and an experiential learning.