• Title/Summary/Keyword: Voice learning

Search Result 272, Processing Time 0.024 seconds

Artificial intelligence wearable platform that supports the life cycle of the visually impaired (시각장애인의 라이프 사이클을 지원하는 인공지능 웨어러블 플랫폼)

  • Park, Siwoong;Kim, Jeung Eun;Kang, Hyun Seo;Park, Hyoung Jun
    • Journal of Platform Technology
    • /
    • v.8 no.4
    • /
    • pp.20-28
    • /
    • 2020
  • In this paper, a voice, object, and optical character recognition platform including voice recognition-based smart wearable devices, smart devices, and web AI servers was proposed as an appropriate technology to help the visually impaired to live independently by learning the life cycle of the visually impaired in advance. The wearable device for the visually impaired was designed and manufactured with a reverse neckband structure to increase the convenience of wearing and the efficiency of object recognition. And the high-sensitivity small microphone and speaker attached to the wearable device was configured to support the voice recognition interface function consisting of the app of the smart device linked to the wearable device. From experimental results, the voice, object, and optical character recognition service used open source and Google APIs in the web AI server, and it was confirmed that the accuracy of voice, object and optical character recognition of the service platform achieved an average of 90% or more.

  • PDF

Design and Prototype Implementation of a Smartphone Functional Application for Learning Chinese Language (중국어 학습을 위한 스마트폰 기능성 어플리케이션 설계 및 프로토타입 구현)

  • Maeng, Soo Yeon;Lee, Eun Ryoung
    • Journal of Digital Contents Society
    • /
    • v.17 no.4
    • /
    • pp.265-272
    • /
    • 2016
  • Recently Chinese education market and social interest has been extended. Accordingly, smart learning based on smartphone applications became part of new educational paradigm. Also, there are more active research and development of applications for the Chinese language education. In this paper, we designed and implemented the smartphone functional application prototype for learning basic Chinese characters. Expression of Chinese characters, the comparison, listening in pronunciation, voice recording and listening, related content learning, and implement testing presented using casual user interface. In the future study, we will develop the prototype with user interface for learning Chinese conversation and individual index of evaluation can be effective learning Instrument without additional tools.

An AI Technology-based Intelligent Senior Assistant Voice Recognition System (AI 기술 기반 지능형 시니어 도우미 음성인식 시스템)

  • Hong, Phil-Doo
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2019.05a
    • /
    • pp.355-357
    • /
    • 2019
  • Now that we are entering an aging society, the user interface for new devices and IoT technology is very inconvenient for senior generation. To improve this, we propose an AI technology-based intelligent senior assistant voice recognition system. This system implements Cloud platform based API to accumulate data for machine learning processing, provides content for diagnosis and prevention of dementia, and provide chat-bot content for senior generation. We hope that senior generations will increase the accessibility and convenience of IoT devices and new technology devices with our system.

  • PDF

Automatic severity classification of dysarthria using voice quality, prosody, and pronunciation features (음질, 운율, 발음 특징을 이용한 마비말장애 중증도 자동 분류)

  • Yeo, Eun Jung;Kim, Sunhee;Chung, Minhwa
    • Phonetics and Speech Sciences
    • /
    • v.13 no.2
    • /
    • pp.57-66
    • /
    • 2021
  • This study focuses on the issue of automatic severity classification of dysarthric speakers based on speech intelligibility. Speech intelligibility is a complex measure that is affected by the features of multiple speech dimensions. However, most previous studies are restricted to using features from a single speech dimension. To effectively capture the characteristics of the speech disorder, we extracted features of multiple speech dimensions: voice quality, prosody, and pronunciation. Voice quality consists of jitter, shimmer, Harmonic to Noise Ratio (HNR), number of voice breaks, and degree of voice breaks. Prosody includes speech rate (total duration, speech duration, speaking rate, articulation rate), pitch (F0 mean/std/min/max/med/25quartile/75 quartile), and rhythm (%V, deltas, Varcos, rPVIs, nPVIs). Pronunciation contains Percentage of Correct Phonemes (Percentage of Correct Consonants/Vowels/Total phonemes) and degree of vowel distortion (Vowel Space Area, Formant Centralized Ratio, Vowel Articulatory Index, F2-Ratio). Experiments were conducted using various feature combinations. The experimental results indicate that using features from all three speech dimensions gives the best result, with a 80.15 F1-score, compared to using features from just one or two speech dimensions. The result implies voice quality, prosody, and pronunciation features should all be considered in automatic severity classification of dysarthria.

On The Voice Training of Stage Speech in Acting Education - Yuri Vasiliev's Stage Speech Training Method - (연기 교육에서 무대 언어의 발성 훈련에 관하여 - 유리 바실리예프의 무대 언어 훈련방법 -)

  • Xu, Cheng-Kang
    • Journal of Korea Entertainment Industry Association
    • /
    • v.15 no.3
    • /
    • pp.203-210
    • /
    • 2021
  • Yuri Vasilyev - actor, director and drama teacher. Russian meritorious artist, winner of the stage "Medal of Friendship" awarded by Russian President Vladimir Putin; academician of the Petrovsky Academy of Sciences and Arts in Russia, professor of the Russian National Academy of Performing Arts, and professor of the Bavarian Academy of Drama in Munich, Germany. The physiological sense stimulation method based on the improvement of voice, language and motor function of drama actors. On the basis of a systematic understanding of performing arts, Yuri Vasiliev created a unique training method of speech expression and skills. From the complicated art training, we find out the most critical skills for focused training, which we call basic skills training. Throughout the whole training process, Professor Yuri made a clear request for the actor's lines: "action! This is the basis of actors' creation. So action is the key! Action and voice are closely linked. Actor's voice is human voice, human life, human feeling, human experience and disaster. It is also the foundation of creation that actors acquire their own voice. What we are engaged in is pronunciation, breathing, tone and intonation, speed and rhythm, expressiveness, sincerity, stage voice and movement, gesture, all of which are used to train the voice of actors according to the standard of drama. In short, Professor Yuri's training course is not only the training of stage performance and skills, but also contains a rich view of drama and performance. I think, in addition to learning from the means and methods of training, it is more important for us to understand the starting point and training objectives of Professor Yuri's use of these exercises.

Multi-view learning review: understanding methods and their application (멀티 뷰 기법 리뷰: 이해와 응용)

  • Bae, Kang Il;Lee, Yung Seop;Lim, Changwon
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.1
    • /
    • pp.41-68
    • /
    • 2019
  • Multi-view learning considers data from various viewpoints as well as attempts to integrate various information from data. Multi-view learning has been studied recently and has showed superior performance to a model learned from only a single view. With the introduction of deep learning techniques to a multi-view learning approach, it has showed good results in various fields such as image, text, voice, and video. In this study, we introduce how multi-view learning methods solve various problems faced in human behavior recognition, medical areas, information retrieval and facial expression recognition. In addition, we review data integration principles of multi-view learning methods by classifying traditional multi-view learning methods into data integration, classifiers integration, and representation integration. Finally, we examine how CNN, RNN, RBM, Autoencoder, and GAN, which are commonly used among various deep learning methods, are applied to multi-view learning algorithms. We categorize CNN and RNN-based learning methods as supervised learning, and RBM, Autoencoder, and GAN-based learning methods as unsupervised learning.

Predictive maintenance architecture development for nuclear infrastructure using machine learning

  • Gohel, Hardik A.;Upadhyay, Himanshu;Lagos, Leonel;Cooper, Kevin;Sanzetenea, Andrew
    • Nuclear Engineering and Technology
    • /
    • v.52 no.7
    • /
    • pp.1436-1442
    • /
    • 2020
  • Nuclear infrastructure systems play an important role in national security. The functions and missions of nuclear infrastructure systems are vital to government, businesses, society and citizen's lives. It is crucial to design nuclear infrastructure for scalability, reliability and robustness. To do this, we can use machine learning, which is a state of the art technology used in various fields ranging from voice recognition, Internet of Things (IoT) device management and autonomous vehicles. In this paper, we propose to design and develop a machine learning algorithm to perform predictive maintenance of nuclear infrastructure. Support vector machine and logistic regression algorithms will be used to perform the prediction. These machine learning techniques have been used to explore and compare rare events that could occur in nuclear infrastructure. As per our literature review, support vector machines provide better performance metrics. In this paper, we have performed parameter optimization for both algorithms mentioned. Existing research has been done in conditions with a great volume of data, but this paper presents a novel approach to correlate nuclear infrastructure data samples where the density of probability is very low. This paper also identifies the respective motivations and distinguishes between benefits and drawbacks of the selected machine learning algorithms.

An Application of the HoQ Framework to Website Performance Improvement: Case Study of an Online Education Website (웹사이트 경쟁력 강화를 위한 평가 및 개선 방안 : HoQ 모형에 기반한, 온라인교육 K사 웹사이트의 품질 개선)

  • Kim, Do-Hoon;Suh, Young-Ho;Roh, In-Sung
    • Journal of Korean Society for Quality Management
    • /
    • v.33 no.2
    • /
    • pp.40-50
    • /
    • 2005
  • HoQ (House of Quality) provides an effective tool not only to arrange and evaluate VoC (Voice of Customers) and VoE (Voice of Engineers), but also to link and combine VoC and VoE, thereby presenting explicit directions for quality improvement. There have been, however, few researches on the HoQ framework in the IT industry. The case study discussed here serves an illustration of the applicability and usefulness of the HoQ approach to website quality improvement. The proposed HoQ framework shows great potentials since customers needs are explicitly considered in the framework, and it helps website administrators develop better web services by providing guidelines for reengineering the website operations.

A performance evaluation study of a deep learning-based voice synthesis technique using Mel-Conceptual Distortion (MCD). (멜-셉스트럴 왜곡(MCD)를 활용한 딥러닝 기반 목소리 합성 기술의 성능 평가 연구)

  • Jaesang Han;Yunseo Kang;Sangwoo Na;Hayeon Lee
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.05a
    • /
    • pp.488-489
    • /
    • 2023
  • 노래 음성 변환(Singing Voice Conversion, SVC)은 오디오 처리 분야에서 최근 활발히 연구되는 분야 중 하나로, 원래의 멜로디와 가사를 유지하면서 소스 가수의 노래 음성을 대상 가수의 음성으로 변환하는 것을 목표로 한다. 본 논문에서는 딥러닝 기반 SVC 모델을 중심으로 멜 셉스트럴 왜곡 지표를 활용해 모델 간 성능 평가를 진행한다. 이를 통해 엔터테인먼트, 교육 등 분야에서 효율적인 SVC 모델을 찾아 활용할 수 있을 것이다.

Recognition of the Korean Character Using Phase Synchronization Neural Oscillator

  • Lee, Joon-Tark;Kwon, Yang-Bum
    • Journal of Advanced Marine Engineering and Technology
    • /
    • v.28 no.2
    • /
    • pp.347-353
    • /
    • 2004
  • Neural oscillator can be applied to oscillator systems such as analysis of image information, voice recognition and etc, Conventional learning algorithms(Neural Network or EBPA(Error Back Propagation Algorithm)) are not proper for oscillatory systems with the complicate input patterns because of its too much complex structure. However, these problems can be easily solved by using a synchrony characteristic of neural oscillator with PLL(phase locked loop) function and a simple Hebbian learning rule, Therefore, in this paper, it will introduce an technique for Recognition of the Korean Character using Phase Synchronization Neural Oscillator and will show the result of simulation.