• 제목/요약/키워드: Speech Learning Model

검색결과 190건 처리시간 0.026초

Phoneme segmentation and Recognition using Support Vector Machines (Support Vector Machines에 의한 음소 분할 및 인식)

  • Lee, Gwang-Seok;Kim, Deok-Hyun
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 한국해양정보통신학회 2010년도 춘계학술대회
    • /
    • pp.981-984
    • /
    • 2010
  • In this paper, we used Support Vector Machines(SVMs) as the learning method, one of Artificial Neural Network, to segregated from the continuous speech into phonemes, an initial, medial, and final sound, and then, performed continuous speech recognition from it. A Decision boundary of phoneme is determined by algorithm with maximum frequency in a short interval. Speech recognition process is performed by Continuous Hidden Markov Model(CHMM), and we compared it with another phoneme segregated from the eye-measurement. From the simulation results, we confirmed that the method, SVMs, we proposed is more effective in an initial sound than Gaussian Mixture Models(GMMs).

  • PDF

Object Detection Algorithm for Explaining Products to the Visually Impaired (시각장애인에게 상품을 안내하기 위한 객체 식별 알고리즘)

  • Park, Dong-Yeon;Lim, Soon-Bum
    • The Journal of the Korea Contents Association
    • /
    • 제22권10호
    • /
    • pp.1-10
    • /
    • 2022
  • Visually impaired people have very difficulty using retail stores due to the absence of braille information on products and any other support system. In this paper, we propose a basic algorithm for a system that recognizes products in retail stores and explains them as a voice. First, the deep learning model detects hand objects and product objects in the input image. Then, it finds a product object that most overlapping hand object by comparing the coordinate information of each detected object. We determine that this is a product selected by the user, and the system read the nutritional information of the product as Text-To-Speech. As a result of the evaluation, we confirmed a high performance of the learning model. The proposed algorithm can be actively used to build a system that supports the use of retail stores for the visually impaired.

Research on Developing a Conversational AI Callbot Solution for Medical Counselling

  • Won Ro LEE;Jeong Hyon CHOI;Min Soo KANG
    • Korean Journal of Artificial Intelligence
    • /
    • 제11권4호
    • /
    • pp.9-13
    • /
    • 2023
  • In this study, we explored the potential of integrating interactive AI callbot technology into the medical consultation domain as part of a broader service development initiative. Aimed at enhancing patient satisfaction, the AI callbot was designed to efficiently address queries from hospitals' primary users, especially the elderly and those using phone services. By incorporating an AI-driven callbot into the hospital's customer service center, routine tasks such as appointment modifications and cancellations were efficiently managed by the AI Callbot Agent. On the other hand, tasks requiring more detailed attention or specialization were addressed by Human Agents, ensuring a balanced and collaborative approach. The deep learning model for voice recognition for this study was based on the Transformer model and fine-tuned to fit the medical field using a pre-trained model. Existing recording files were converted into learning data to perform SSL(self-supervised learning) Model was implemented. The ANN (Artificial neural network) neural network model was used to analyze voice signals and interpret them as text, and after actual application, the intent was enriched through reinforcement learning to continuously improve accuracy. In the case of TTS(Text To Speech), the Transformer model was applied to Text Analysis, Acoustic model, and Vocoder, and Google's Natural Language API was applied to recognize intent. As the research progresses, there are challenges to solve, such as interconnection issues between various EMR providers, problems with doctor's time slots, problems with two or more hospital appointments, and problems with patient use. However, there are specialized problems that are easy to make reservations. Implementation of the callbot service in hospitals appears to be applicable immediately.

Study on Ability to Communicate with the Smart-based Cooperative Learning (스마트 기반 협동학습을 통한 의사소통능력 신장에 관한 연구)

  • Kim, Jeongrang;Noh, Jaechoon
    • Journal of The Korean Association of Information Education
    • /
    • 제18권4호
    • /
    • pp.625-632
    • /
    • 2014
  • Due to the development of information and communication technology smart devices and apps, SNS, mirroring communication is made with such a smart and education to reflect the change of emphasis on the recent variety of collaborative social interaction are emerging. In this study, smart training and LT cooperative learning model developed in conjunction with the 'Smart-based cooperative learning 'model applied in the third grade social studies class and Smart-based cooperative learning and cooperative learning common to kidney doctor communication skills of elementary school students the impact on the communication capacity compared respectively, were analyzed. As a result, the expression of the elementary school, listening and understanding, all the sub-areas of interaction, such as communication skills in social studies class kidneys were applied to Smart-based cooperative learning in elementary school than applying the general cooperative learning model. This is not said to improve the ability to interact with the Smart-based cooperative learning in speech and in writing and clearly express the thoughts and opinions of students and separates help you understand the meaning of the words and writings of other students for the purpose in social situations can.

Dialect classification based on the speed and the pause of speech utterances (발화 속도와 휴지 구간 길이를 사용한 방언 분류)

  • Jonghwan Na;Bowon Lee
    • Phonetics and Speech Sciences
    • /
    • 제15권2호
    • /
    • pp.43-51
    • /
    • 2023
  • In this paper, we propose an approach for dialect classification based on the speed and pause of speech utterances as well as the age and gender of the speakers. Dialect classification is one of the important techniques for speech analysis. For example, an accurate dialect classification model can potentially improve the performance of speaker or speech recognition. According to previous studies, research based on deep learning using Mel-Frequency Cepstral Coefficients (MFCC) features has been the dominant approach. We focus on the acoustic differences between regions and conduct dialect classification based on the extracted features derived from the differences. In this paper, we propose an approach of extracting underexplored additional features, namely the speed and the pauses of speech utterances along with the metadata including the age and the gender of the speakers. Experimental results show that our proposed approach results in higher accuracy, especially with the speech rate feature, compared to the method only using the MFCC features. The accuracy improved from 91.02% to 97.02% compared to the previous method that only used MFCC features, by incorporating all the proposed features in this paper.

A Study on Text Mining Analysis of Presidential Maritime Concept in KOREA (텍스트마이닝을 이용한 한국 대통령의 해양관에 관한 연구)

  • Kim, Sung-Kuk;Lee, Tae-Hwee
    • Journal of Korea Port Economic Association
    • /
    • 제36권3호
    • /
    • pp.39-54
    • /
    • 2020
  • In the presidential political system, the word of the president has great influence on the formation of national policy and the decision-making process. Policy priorities are determined according to the president's ideology and core values, and various policies are established and executed according to the priorities. Therefore, this paper analyzes the contents of the president's speech. Since the president's speech is a semantic datum, in order to analyze unstructured text, big data analysis is conducted through the methods of machine learning and deep learning. In this study, the president's speech at the "National Sea Day" commemoration was obtained 1996 onwards and analyzed using topic modeling. As a result of the analysis, all the presidents' speeches were delivered with a view of the ocean that was consistent with the direction of their administration. It was confirmed that the ocean-industry-resource topics, which are the intrinsic values of the ocean, were not damaged and consistently emphasized by all presidents.

A Study on Development of Embedded System for Speech Recognition using Multi-layer Recurrent Neural Prediction Models & HMM (다층회귀신경예측 모델 및 HMM 를 이용한 임베디드 음성인식 시스템 개발에 관한 연구)

  • Kim, Jung hoon;Jang, Won il;Kim, Young tak;Lee, Sang bae
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • 제14권3호
    • /
    • pp.273-278
    • /
    • 2004
  • In this paper, the recurrent neural networks (RNN) is applied to compensate for HMM recognition algorithm, which is commonly used as main recognizer. Among these recurrent neural networks, the multi-layer recurrent neural prediction model (MRNPM), which allows operating in real-time, is used to implement learning and recognition, and HMM and MRNPM are used to design a hybrid-type main recognizer. After testing the designed speech recognition algorithm with Korean number pronunciations (13 words), which are hardly distinct, for its speech-independent recognition ratio, about 5% improvement was obtained comparing with existing HMM recognizers. Based on this result, only optimal (recognition) codes were extracted in the actual DSP (TMS320C6711) environment, and the embedded speech recognition system was implemented. Similarly, the implementation result of the embedded system showed more improved recognition system implementation than existing solid HMM recognition systems.

Structural Model Analysis of the Effectiveness of Problem Solving Ability by Team-Based Learning Pedagogy

  • Moon, Kyung-Im
    • Journal of the Korea Society of Computer and Information
    • /
    • 제25권10호
    • /
    • pp.193-201
    • /
    • 2020
  • This study is to evaluate the effectiveness of problem-solving ability by applying a team-based learning model to the classes of humanities and social science students, and to conduct a structural model analysis on the relationship between sub-factors. Team-based learning was conducted six times in six teams with 30 students in the second and third grades of the humanities and social sciences. The problem solving ability score of the target students was significantly higher after team-based learning and was statistically significant. There was no problem in normality with the latent variables, which are the sub-factors of problem solving ability, and the factor load value was statistically significant at the .001 level in the confirmatory factor analysis of the observed variables for the latent variables, which was a valid model. A good level of fitness was also shown in the verification of the fitness of the research model. As a result, it was analyzed that latent variables of cause analysis, problem clarification, planning execution, performance evaluation, and alternative development had an indirect or direct influence on each other.

Speakers' Intention Analysis Based on Partial Learning of a Shared Layer in a Convolutional Neural Network (Convolutional Neural Network에서 공유 계층의 부분 학습에 기반 한 화자 의도 분석)

  • Kim, Minkyoung;Kim, Harksoo
    • Journal of KIISE
    • /
    • 제44권12호
    • /
    • pp.1252-1257
    • /
    • 2017
  • In dialogues, speakers' intentions can be represented by sets of an emotion, a speech act, and a predicator. Therefore, dialogue systems should capture and process these implied characteristics of utterances. Many previous studies have considered such determination as independent classification problems, but others have showed them to be associated with each other. In this paper, we propose an integrated model that simultaneously determines emotions, speech acts, and predicators using a convolution neural network. The proposed model consists of a particular abstraction layer, mutually independent informations of these characteristics are abstracted. In the shared abstraction layer, combinations of the independent information is abstracted. During training, errors of emotions, errors of speech acts, and errors of predicators are partially back-propagated through the layers. In the experiments, the proposed integrated model showed better performances (2%p in emotion determination, 11%p in speech act determination, and 3%p in predicator determination) than independent determination models.

Performance Comparison of State-of-the-Art Vocoder Technology Based on Deep Learning in a Korean TTS System (한국어 TTS 시스템에서 딥러닝 기반 최첨단 보코더 기술 성능 비교)

  • Kwon, Chul Hong
    • The Journal of the Convergence on Culture Technology
    • /
    • 제6권2호
    • /
    • pp.509-514
    • /
    • 2020
  • The conventional TTS system consists of several modules, including text preprocessing, parsing analysis, grapheme-to-phoneme conversion, boundary analysis, prosody control, acoustic feature generation by acoustic model, and synthesized speech generation. But TTS system with deep learning is composed of Text2Mel process that generates spectrogram from text, and vocoder that synthesizes speech signals from spectrogram. In this paper, for the optimal Korean TTS system construction we apply Tacotron2 to Tex2Mel process, and as a vocoder we introduce the methods such as WaveNet, WaveRNN, and WaveGlow, and implement them to verify and compare their performance. Experimental results show that WaveNet has the highest MOS and the trained model is hundreds of megabytes in size, but the synthesis time is about 50 times the real time. WaveRNN shows MOS performance similar to that of WaveNet and the model size is several tens of megabytes, but this method also cannot be processed in real time. WaveGlow can handle real-time processing, but the model is several GB in size and MOS is the worst of the three vocoders. From the results of this study, the reference criteria for selecting the appropriate method according to the hardware environment in the field of applying the TTS system are presented in this paper.