• Title/Summary/Keyword: Voice Training

Search Result 177, Processing Time 0.022 seconds

Text-to-speech with linear spectrogram prediction for quality and speed improvement (음질 및 속도 향상을 위한 선형 스펙트로그램 활용 Text-to-speech)

  • Yoon, Hyebin
    • Phonetics and Speech Sciences
    • /
    • v.13 no.3
    • /
    • pp.71-78
    • /
    • 2021
  • Most neural-network-based speech synthesis models utilize neural vocoders to convert mel-scaled spectrograms into high-quality, human-like voices. However, neural vocoders combined with mel-scaled spectrogram prediction models demand considerable computer memory and time during the training phase and are subject to slow inference speeds in an environment where GPU is not used. This problem does not arise in linear spectrogram prediction models, as they do not use neural vocoders, but these models suffer from low voice quality. As a solution, this paper proposes a Tacotron 2 and Transformer-based linear spectrogram prediction model that produces high-quality speech and does not use neural vocoders. Experiments suggest that this model can serve as the foundation of a high-quality text-to-speech model with fast inference speed.

Deep Learning based Raw Audio Signal Bandwidth Extension System (딥러닝 기반 음향 신호 대역 확장 시스템)

  • Kim, Yun-Su;Seok, Jong-Won
    • Journal of IKEEE
    • /
    • v.24 no.4
    • /
    • pp.1122-1128
    • /
    • 2020
  • Bandwidth Extension refers to restoring and expanding a narrow band signal(NB) that is damaged or damaged in the encoding and decoding process due to the lack of channel capacity or the characteristics of the codec installed in the mobile communication device. It means converting to a wideband signal(WB). Bandwidth extension research mainly focuses on voice signals and converts high bands into frequency domains, such as SBR (Spectral Band Replication) and IGF (Intelligent Gap Filling), and restores disappeared or damaged high bands based on complex feature extraction processes. In this paper, we propose a model that outputs an bandwidth extended signal based on an autoencoder among deep learning models, using the residual connection of one-dimensional convolutional neural networks (CNN), the bandwidth is extended by inputting a time domain signal of a certain length without complicated pre-processing. In addition, it was confirmed that the damaged high band can be restored even by training on a dataset containing various types of sound sources including music that is not limited to the speech.

Development of Multi-person remote collaboration system using WebRTC for fields adaptation (WebRTC를 이용한 현장 적응형 다자간 원격협업 시스템 개발)

  • Lee, Kwanhee;Kim, Ji-In;Kwon, Goo-Rak
    • Smart Media Journal
    • /
    • v.10 no.4
    • /
    • pp.9-14
    • /
    • 2021
  • In the case of the existing remote collaboration, the remote support service-oriented system is not suitable for the use of the field-oriented multi-person remote collaboration system. This paper is a remote collaboration system development for various industrial sites. We develop remote support and work management that faces the various needs of industrial sites, real-time video remote support between workers, and real-time voice work sharing between workers. In addition, The goal of the development aims to increase the usability by strengthening the security function through encryption in the video and to develop a more efficient system. Finally, the development contents are the remote management and the support software development, Android app development for worker, WebRTC-based remote collaboration system construction and development, and prototype development. These products are expected to increase demand and increase sales by installing and operating at industrial sites, and can promote manpower training, understanding trending technologies, and improving capabilities.

A Study on the Weather Support Service for Winter Sports (동계스포츠 맞춤형 기상지원 서비스를 위한 연구)

  • Back, Jin-Ho;Panday, Siddhartha Bikram;Lee, Ju-Sung;Kang, Hyo-Min
    • Journal of Korea Entertainment Industry Association
    • /
    • v.13 no.1
    • /
    • pp.139-156
    • /
    • 2019
  • The purpose of this study was to provide a method to support customized weather and environmental information services for the successful operation of winter sporting events. First, individual in-depth interviews and surveys were conducted with athletes, coaching staffs and experts related to the competition for 10 different winter sports for analysis of their needs. We conducted face-to-face survey and survey considering the training schedule and situation of experts. The recorded voice file was converted into word text, and extracted the weather and environmental information elements embedded in the opinions of the research participants based on literature reviews and data. The findings are expected to provide basic data on the weather conditions required to support specialized weather information for future large winter sports events, including the PyeongChang Winter Olympics.

Application for Workout and Diet Assistant using Image Processing and Machine Learning Skills (영상처리 및 머신러닝 기술을 이용하는 운동 및 식단 보조 애플리케이션)

  • Chi-Ho Lee;Dong-Hyun Kim;Seung-Ho Choi;In-Woong Hwang;Kyung-Sook Han
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.23 no.5
    • /
    • pp.83-88
    • /
    • 2023
  • In this paper, we developed a workout and diet assistance application to meet the growing demand for workout and dietary support services due to the increase in the home training population. The application analyzes the user's workout posture in real-time through the camera and guides the correct posture using guiding lines and voice feedback. It also classifies the foods included in the captured photos, estimates the amount of each food, and calculates and provides nutritional information such as calories. Nutritional information calculations are executed on the server, which then transmits the results back to the application. Once received, this data is presented visually to the user. Additionally, workout results and nutritional information are saved and organized by date for users to review.

Gesture Control Gaming for Motoric Post-Stroke Rehabilitation

  • Andi Bese Firdausiah Mansur
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.10
    • /
    • pp.37-43
    • /
    • 2023
  • The hospital situation, timing, and patient restrictions have become obstacles to an optimum therapy session. The crowdedness of the hospital might lead to a tight schedule and a shorter period of therapy. This condition might strike a post-stroke patient in a dilemma where they need regular treatment to recover their nervous system. In this work, we propose an in-house and uncomplex serious game system that can be used for physical therapy. The Kinect camera is used to capture the depth image stream of a human skeleton. Afterwards, the user might use their hand gesture to control the game. Voice recognition is deployed to ease them with play. Users must complete the given challenge to obtain a more significant outcome from this therapy system. Subjects will use their upper limb and hands to capture the 3D objects with different speeds and positions. The more substantial challenge, speed, and location will be increased and random. Each delegated entity will raise the scores. Afterwards, the scores will be further evaluated to correlate with therapy progress. Users are delighted with the system and eager to use it as their daily exercise. The experimental studies show a comparison between score and difficulty that represent characteristics of user and game. Users tend to quickly adapt to easy and medium levels, while high level requires better focus and proper synchronization between hand and eye to capture the 3D objects. The statistical analysis with a confidence rate(α:0.05) of the usability test shows that the proposed gaming is accessible, even without specialized training. It is not only for therapy but also for fitness because it can be used for body exercise. The result of the experiment is very satisfying. Most users enjoy and familiarize themselves quickly. The evaluation study demonstrates user satisfaction and perception during testing. Future work of the proposed serious game might involve haptic devices to stimulate their physical sensation.

Motion Study of Treatment Robot for Autistic Children Using Speech Data Classification Based on Artificial Neural Network (음성 분류 인공신경망을 활용한 자폐아 치료용 로봇의 지능화 동작 연구)

  • Lee, Jin-Gyu;Lee, Bo-Hee
    • Journal of IKEEE
    • /
    • v.23 no.4
    • /
    • pp.1440-1447
    • /
    • 2019
  • Currently, the prevalence of autism spectrum disorders in children is reported to be higher and shows various types of disorders. In particular, they are having difficulty in communication due to communication impairment in the area of social communication and need to be improved through training. Thus, this study proposes a method of acquiring voice information through a microphone mounted on a robot designed through preliminary research and using this information to make intelligent motions. An ANN(Artificial Neural Network) was used to classify the speech data into robot motions, and we tried to improve the accuracy by combining the Recurrent Neural Network based on Convolutional Neural Network. The preprocessing of input speech data was analyzed using MFCC(Mel-Frequency Cepstral Coefficient), and the motion of the robot was estimated using various data normalization and neural network optimization techniques. In addition, the designed ANN showed a high accuracy by conducting an experiment comparing the accuracy with the existing architecture and the method of human intervention. In order to design robot motions with higher accuracy in the future and to apply them in the treatment and education environment of children with autism.

The cinematic interpretation of pansori and its transformation process (판소리의 영화적 해석과 변모의 과정)

  • Song, So-ra
    • (The) Research of the performance art and culture
    • /
    • no.43
    • /
    • pp.47-78
    • /
    • 2021
  • This study was written to examine the acceptance of pansori in movies based on pansori, and to explore changes in modern society's perception and expectations of pansori. A pansori is getting the love of the upper and lower castes in the late Joseon period, but loses the status at the time of the Japanese colonial rule and Korean War. In response, the country designated pansori as an important intangible cultural asset in 1964 to protect the disappearance of pansori. Until the 1980s, however, pansori did not gain popularity by itself. After the 2000s, Pansori tried to breathe in with the contemporary public due to the socio-cultural demand to globalize our culture. And now Pansori is one of the most popular cultures in the world today, as the pop band Feel the Rhythm of KOREA shows. The changing public perception of pansori and its status in modern society can also be seen in the mass media called movies. This study explored the process of this change with six films based on pansori, from "Seopyeonje" directed by Lim Kwon-taek in 1993 to the film "The Singer" in 2020. First, the films "Seopyeonje" and "Hwimori" were produced in the 1990s. Both of these films show the reality of pansori, which has fallen out of public interest due to the crisis of transmission in the early and mid-20th century. And in the midst of that, he captured the scene of a singer struggling fiercely for the artistic completion of Pansori itself. Next, look at the film "Lineage of the Voice" in 2008 and "DURESORI: The Voice of East" in 2012. These two films depict the growth of children who perform art, featuring contemporary children who play pansori and Korean traditional music. Pansori in these films is no longer an old piece of music, nor is it a sublime art that is completed in harsh training. It is only naturally treated as one of the contemporary arts. Finally, "The Sound of a Flower" in 2015 and "The Singer" in 2020. The two films constructed a story from Pansori's history based on the time background of the film during the late Joseon Dynasty, when Pansori was loved the most by the people. This reflects the atmosphere of the times when traditions are used as the subject of cultural content, and shows the changed public perception of pansori and the status of pansori.

NUI/NUX of the Virtual Monitor Concept using the Concentration Indicator and the User's Physical Features (사용자의 신체적 특징과 뇌파 집중 지수를 이용한 가상 모니터 개념의 NUI/NUX)

  • Jeon, Chang-hyun;Ahn, So-young;Shin, Dong-il;Shin, Dong-kyoo
    • Journal of Internet Computing and Services
    • /
    • v.16 no.6
    • /
    • pp.11-21
    • /
    • 2015
  • As growing interest in Human-Computer Interaction(HCI), research on HCI has been actively conducted. Also with that, research on Natural User Interface/Natural User eXperience(NUI/NUX) that uses user's gesture and voice has been actively conducted. In case of NUI/NUX, it needs recognition algorithm such as gesture recognition or voice recognition. However these recognition algorithms have weakness because their implementation is complex and a lot of time are needed in training because they have to go through steps including preprocessing, normalization, feature extraction. Recently, Kinect is launched by Microsoft as NUI/NUX development tool which attracts people's attention, and studies using Kinect has been conducted. The authors of this paper implemented hand-mouse interface with outstanding intuitiveness using the physical features of a user in a previous study. However, there are weaknesses such as unnatural movement of mouse and low accuracy of mouse functions. In this study, we designed and implemented a hand mouse interface which introduce a new concept called 'Virtual monitor' extracting user's physical features through Kinect in real-time. Virtual monitor means virtual space that can be controlled by hand mouse. It is possible that the coordinate on virtual monitor is accurately mapped onto the coordinate on real monitor. Hand-mouse interface based on virtual monitor concept maintains outstanding intuitiveness that is strength of the previous study and enhance accuracy of mouse functions. Further, we increased accuracy of the interface by recognizing user's unnecessary actions using his concentration indicator from his encephalogram(EEG) data. In order to evaluate intuitiveness and accuracy of the interface, we experimented it for 50 people from 10s to 50s. As the result of intuitiveness experiment, 84% of subjects learned how to use it within 1 minute. Also, as the result of accuracy experiment, accuracy of mouse functions (drag(80.4%), click(80%), double-click(76.7%)) is shown. The intuitiveness and accuracy of the proposed hand-mouse interface is checked through experiment, this is expected to be a good example of the interface for controlling the system by hand in the future.

Clinical Characeristics of Intracordal Cysts (성대낭종의 임상적 특성)

  • Hong, Ki-Hwan;Park, Jung-Hoon;Kim, Won;Kim, Chang-Hyun
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.10 no.2
    • /
    • pp.164-169
    • /
    • 1999
  • Background and Objectives : The intracordal cysts are more increasingly diagnosed and treated due to advanced laryngeal stroboscopy and laryngeal microsurgical technique. The intracordal cysts are frequently misdiagnosed as vocal polyp or nodule The purpose of this study is to evaluate clinical features of intracordal cysts. Materials and Methods : In the present series, 83 cases of the intracordal cysts treated with laryngeal microsurgery are reported. The intracordal cysts are diagnosed preoperatively with indirect laryngoscopy, laryngeal endoscopy, laryngeal stroboscopy and confirmed with laryngeal microsurgical findings and biopsies. Results : Intracordal cysts are 83 of 1900 patients treated with laryngeal microsurgery(4.4%)-ductal cysts are 56 cases and epidermoid cysts are 27 cases. Intracordal cysts are more frequent in women, forties and the frequent site is an anterior third of the true vocal cord. With the indirect laryngoscopic examination, the ductal cysts are frequently misdiagnosed as vocal polyps or nodules but the epidermoid cysts are relatively easily diagnosed. The etiologic factors of the intracordal cysts are suspected as voice abuse and upper respiratory infection. The degree of postoperative voice satisfaction is similar to that of the vocal polyps. Conclusion : Intracordal cysts are frequently misdiagnosed as polyps or nodules, therefore preoperative stroboscopic findings and laryngeal microsurgical findings is important. An ideal treatment is to enucleate the cysts avoiding rupture of cyst and injury of lamina propria of the vocal cord.

  • PDF