• Title/Summary/Keyword: Voice training

Search Result 180, Processing Time 0.03 seconds

Acoustic Analysis of Classically Trained Western Singers (서양 음악을 전공으로 하는 성악인의 음향학적 분석)

  • 정성민
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.10 no.2
    • /
    • pp.124-129
    • /
    • 1999
  • Background and Objectives : Classical singers are capable of masking abnormalities due to their high level of training and may present with apparent technical deficits rather than with obvious dysfunction. Therefore, some variations from expected normal laryngeal behavior may be present in trained classical singers. Consequently it is important for otolaryngologist to obtain a baseline assessment of their laryngeal function. Materials and Methods : Acoustic measurement including strobovideolaryngoscopy from 50 classically trained singers was done for this study, which was compared with the data from 20 untrained adults. Results and Conclusion : This study showed that 50-healthy asymptomatic classical singers revealed an incidence of 50% abnormal strobovideolaryngoscopic findings, but their acoustic data was within normal limit despite the abnormal laryngeal findings. Therefore the author recommends that the classical singers need objective voice analysis and their baseline data should be used for the accurate diagnosis of the cause of voice dysfunction In classical singer whose baseline laryngeal behavior may be unusual.

  • PDF

Complex nested U-Net-based speech enhancement model using a dual-branch decoder (이중 분기 디코더를 사용하는 복소 중첩 U-Net 기반 음성 향상 모델)

  • Seorim Hwang;Sung Wook Park;Youngcheol Park
    • The Journal of the Acoustical Society of Korea
    • /
    • v.43 no.2
    • /
    • pp.253-259
    • /
    • 2024
  • This paper proposes a new speech enhancement model based on a complex nested U-Net with a dual-branch decoder. The proposed model consists of a complex nested U-Net to simultaneously estimate the magnitude and phase components of the speech signal, and the decoder has a dual-branch decoder structure that performs spectral mapping and time-frequency masking in each branch. At this time, compared to the single-branch decoder structure, the dual-branch decoder structure allows noise to be effectively removed while minimizing the loss of speech information. The experiment was conducted on the VoiceBank + DEMAND database, commonly used for speech enhancement model training, and was evaluated through various objective evaluation metrics. As a result of the experiment, the complex nested U-Net-based speech enhancement model using a dual-branch decoder increased the Perceptual Evaluation of Speech Quality (PESQ) score by about 0.13 compared to the baseline, and showed a higher objective evaluation score than recently proposed speech enhancement models.

Speech Recognition Model Based on CNN using Spectrogram (스펙트로그램을 이용한 CNN 음성인식 모델)

  • Won-Seog Jeong;Haeng-Woo Lee
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.19 no.4
    • /
    • pp.685-692
    • /
    • 2024
  • In this paper, we propose a new CNN model to improve the recognition performance of command voice signals. This method obtains a spectrogram image after performing a short-time Fourier transform (STFT) of the input signal and improves command recognition performance through supervised learning using a CNN model. After Fourier transforming the input signal for each short-time section, a spectrogram image is obtained and multi-classification learning is performed using a CNN deep learning model. This effectively classifies commands by converting the time domain voice signal to the frequency domain to express the characteristics well and performing deep learning training using the spectrogram image for the conversion parameters. To verify the performance of the speech recognition system proposed in this study, a simulation program using Tensorflow and Keras libraries was created and a simulation experiment was performed. As a result of the experiment, it was confirmed that an accuracy of 92.5% could be obtained using the proposed deep learning algorithm.

A Real-Time Embedded Speech Recognition System

  • Nam, Sang-Yep;Lee, Chun-Woo;Lee, Sang-Won;Park, In-Jung
    • Proceedings of the IEEK Conference
    • /
    • 2002.07a
    • /
    • pp.690-693
    • /
    • 2002
  • According to the growth of communication biz, embedded market rapidly developing in domestic and overseas. Embedded system can be used in various way such as wire and wireless communication equipment or information products. There are lots of developing performance applying speech recognition to embedded system, for instance, PDA, PCS, CDMA-2000 or IMT-2000. This study implement minimum memory of speech recognition engine and DB for apply real time embedded system. The implement measure of speech recognition equipment to fit on embedded system is like following. At first, DC element is removed from Input voice and then a compensation of high frequency was achieved by pre-emphasis with coefficients value, 0.97 and constitute division data as same size as 256 sample by lapped shift method. Through by Levinson - Durbin Algorithm, these data can get linear predictive coefficient and again, using Cepstrum - Transformer attain feature vectors. During HMM training, We used Baum-Welch reestimation Algorithm for each words training and can get the recognition result from executed likelihood method on each words. The used speech data is using 40 speech command data and 10 digits extracted form each 15 of male and female speaker spoken menu control command of Embedded system. Since, in many times, ARM CPU is adopted in embedded system, it's peformed porting the speech recognition engine on ARM core evaluation board. And do the recognition test with select set 1 and set 3 parameter that has good recognition rate on commander and no digit after the several tests using by 5 proposal recognition parameter sets. The recognition engine of recognition rate shows 95%, speech commander recognizer shows 96% and digits recognizer shows 94%.

  • PDF

Content-based Image Retrieval Using HSI Color Space and Neural Networks (HSI 컬러 공간과 신경망을 이용한 내용 기반 이미지 검색)

  • Kim, Kwang-Baek;Woo, Young-Woon
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.5 no.2
    • /
    • pp.152-157
    • /
    • 2010
  • The development of computer and internet has introduced various types of media - such as, image, audio, video, and voice - to the traditional text-based information. However, most of the information retrieval systems are based only on text, which results in the absence of ability to use available information. By utilizing the available media, one can improve the performance of search system, which is commonly called content-based retrieval and content-based image retrieval system specifically tries to incorporate the analysis of images into search systems. In this paper, a content-based image retrieval system using HSI color space, ART2 algorithm, and SOM algorithm is introduced. First, images are analyzed in the HSI color space to generate several sets of features describing the images and an SOM algorithm is used to provide candidates of training features to a user. The features that are selected by a user are fed to the training part of a search system, which uses an ART2 algorithm. The proposed system can handle the case in which an image belongs to several groups and showed better performance than other systems.

OnDot: Braille Training System for the Blind (시각장애인을 위한 점자 교육 시스템)

  • Kim, Hak-Jin;Moon, Jun-Hyeok;Song, Min-Uk;Lee, Se-Min;Kong, Ki-sok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.20 no.6
    • /
    • pp.41-50
    • /
    • 2020
  • This paper deals with the Braille Education System which complements the shortcomings of the existing Braille Learning Products. An application dedicated to the blind is configured to perform full functions through touch gestures and voice guidance for user convenience. Braille kit is produced for educational purposes through Arduino and 3D printing. The system supports the following functions. First, the learning of the most basic braille, such as initial consonants, final consonant, vowels, abbreviations, etc. Second, the ability to check learned braille by solving step quizzes. Third, translation of braille. Through the experiment, the recognition rate of touch gestures and the accuracy of braille expression were confirmed, and in case of translation, the translation was done as intended. The system allows blind people to learn braille efficiently.

Primary Teachers' Perception Analysis on Development and Application of STEAM Education Program (융합 인재 교육(STEAM) 연수를 통해 교수.학습 자료 개발 및 현장적용을 경험한 초등교사들의 인식 조사)

  • Lee, Ji Won;Park, Hye Jeong;Kim, Jung Bog
    • Journal of Korean Elementary Science Education
    • /
    • v.32 no.1
    • /
    • pp.47-59
    • /
    • 2013
  • The purpose of this study is to investigate the perception about STEAM education of primary teachers who have developed and applied STEAM education to their students through teacher training program. For this study, 101 among 172 attendance are responded to questionnaire of three categories consisting of development and application teaching material for STEAM instruction, and spreading STEAM education. The major findings are as follows: First, when primary teachers develop materials for STEAM education, they consider applicabilities in real classes. Second, they feel the burden of time when they develop STEAM material. Third, they think that their own program has significant educational effectiveness and that students enjoyed the program. Especially, they think that STEAM education program can raise students' interest about learning. Fourth, primary teachers point out the constraints for application of STEAM education program, which are lack of expertise and difficulty acquiring class time. Fifth, primary teachers evaluate the effect of STEAM education program on primary education is positive, and they answer that we need many teaching materials for STEAM education, operating as a regular curriculum, and securing budget. In order to spread STEAM education in field of primary education successfully, administrators have to consider and reflect the voice of teachers.

The Study of Data Recorder for Mission Replay (임무 재생을 위한 데이터 기록장치 연구)

  • Lee, Sang-Myung;Kim, Young-Kil
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.16 no.8
    • /
    • pp.1817-1823
    • /
    • 2012
  • On the matter in line with NCW(Network Centric Warfare) and information age, the military is on an efficient-expanding trend as sharing with status and information promptly through the various and complex exchange of messages and the voice communication between operators, using a highly efficient operating console. The recording devices that record an operational situation to plan a new operation through the mission analysis and result reviews after finishing military operation or training are developed and operated. Recording method is classified into two groups. one is the direct recording of video data for screen, another is the recording of an exchange of data. This study proposes the new data-oriented recording method to reduce the readiness time for replay and the improvement scheme.

Proposed Efficient Architectures and Design Choices in SoPC System for Speech Recognition

  • Trang, Hoang;Hoang, Tran Van
    • Journal of IKEEE
    • /
    • v.17 no.3
    • /
    • pp.241-247
    • /
    • 2013
  • This paper presents the design of a System on Programmable Chip (SoPC) based on Field Programmable Gate Array (FPGA) for speech recognition in which Mel-Frequency Cepstral Coefficients (MFCC) for speech feature extraction and Vector Quantization for recognition are used. The implementing process of the speech recognition system undergoes the following steps: feature extraction, training codebook, recognition. In the first step of feature extraction, the input voice data will be transformed into spectral components and extracted to get the main features by using MFCC algorithm. In the recognition step, the obtained spectral features from the first step will be processed and compared with the trained components. The Vector Quantization (VQ) is applied in this step. In our experiment, Altera's DE2 board with Cyclone II FPGA is used to implement the recognition system which can recognize 64 words. The execution speed of the blocks in the speech recognition system is surveyed by calculating the number of clock cycles while executing each block. The recognition accuracies are also measured in different parameters of the system. These results in execution speed and recognition accuracy could help the designer to choose the best configurations in speech recognition on SoPC.

Google speech recognition of an English paragraph produced by college students in clear or casual speech styles (대학생들이 또렷한 음성과 대화체로 발화한 영어문단의 구글음성인식)

  • Yang, Byunggon
    • Phonetics and Speech Sciences
    • /
    • v.9 no.4
    • /
    • pp.43-50
    • /
    • 2017
  • These days voice models of speech recognition software are sophisticated enough to process the natural speech of people without any previous training. However, not much research has reported on the use of speech recognition tools in the field of pronunciation education. This paper examined Google speech recognition of a short English paragraph produced by Korean college students in clear and casual speech styles in order to diagnose and resolve students' pronunciation problems. Thirty three Korean college students participated in the recording of the English paragraph. The Google soundwriter was employed to collect data on the word recognition rates of the paragraph. Results showed that the total word recognition rate was 73% with a standard deviation of 11.5%. The word recognition rate of clear speech was around 77.3% while that of casual speech amounted to 68.7%. The reasons for the low recognition rate of casual speech were attributed to both individual pronunciation errors and the software itself as shown in its fricative recognition. Various distributions of unrecognized words were observed depending on each participant and proficiency groups. From the results, the author concludes that the speech recognition software is useful to diagnose each individual or group's pronunciation problems. Further studies on progressive improvements of learners' erroneous pronunciations would be desirable.