• Title/Summary/Keyword: Connected speech

Search Result 147, Processing Time 0.02 seconds

Intelligent Digital Public Address System using Agent Based on Network

  • Kim, Jung-Sook
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.23 no.1
    • /
    • pp.87-92
    • /
    • 2013
  • In this paper, we developed a digital and integrated PA(Public Address) system with speech recognition and sensor connection based on IP with an ID using agent. It has facilities such as an external input, a microphone and a radio for a PA system and has speech recognition. If "fire" is spoken to the PA system then it can recognize the emergency situation and will broadcast information to the appropriate agency immediately. In addition to that, many sensors, such as temperature, humidity, and infrared, etc., can be connected to the PA system and can be integrated with the context awareness which contains many types of information about internal statuses using inference agent. Also, developed the digital integrated PA system will make it possible to broadcast the message to adaptable places using network IP based on IDs. Finally, the digital PA system is designed for operation from a PC, which makes installation and setting of operating parameters very simple and user-friendly. For implementation details, we implemented thread based concurrent processing for the events which occur concurrently from many sensors or users.

Design of A Speech Recognition System using Hidden Markov Models (은닉 마코프 모델을 이용한 음성 인식 시스템 설계)

  • Lee, Chul-Won;Lim, In-Chil
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.33B no.1
    • /
    • pp.108-115
    • /
    • 1996
  • This paper proposes an algorithm and a model topology for the connected speech recognition using Discrete Hidden Markov Models. A proposed model uses diphone and triphone model which consider the recognition rate and recognisable vocabulary. Considering more exact inter- phoneme segmentation and execution speed of algorithm, 4 states have to exist in diphone model where the first state and the last state are keeping a steady state, the other states hold a transient state. 7 states have to exist in triphone model where 7 states are specified and improved to 3 steady states and 4 transition states. Also, the proposed speech recognition algorithm is designed to detect the inter-phoneme segmentation during the recognition processing.

  • PDF

A Use of Songs for Teaching Pronunciations in Elementary School

  • Hong, Kyung-Suk
    • MALSORI
    • /
    • no.41
    • /
    • pp.61-71
    • /
    • 2001
  • How to teach intelligible, communicative pronunciation is a continuous question in the English education. Without good input, we can not expect good output. However, in EFL situation, it is very difficult to input the good English pronunciation, therefore, we have to find out the efficient and effective material for teaching pronunciation. One of the materials is song, because songs contain the linguistic and cultural traits of the language. The purpose of this paper is to clarify the reason why songs are good for teaching pronunciation. Koreans, who are syllable timed language users, have difficulties in English pronunciation of stress, rhythm, consonants cluster, linking or blending in connected speech. The 134 songs from wee sing are analyzed for how these traits show in songs. The result shows that we can acquire the traits easily and naturally through songs. And a lesson plan is offered as an example for teaching songs.

  • PDF

Voice Message System Supporting Massive Outbound Call (대량의 발신 호를 지원하는 음성 메시지 시스템)

  • Kim Jeonggon
    • MALSORI
    • /
    • no.49
    • /
    • pp.77-94
    • /
    • 2004
  • In this paper, new voice message system supporting massive outbound call is proposed. Basic idea of the proposed system is to pre-process all the text-to-speech conversion process, mixing of text and attached music file and to store the results of pre-process in the cache server which is connected to the IVR. New voice message system is optimized for the voice message system supporting massive outbound call by distributing the load of the web server caused by server-side script implementation which is accessing database and generating dynamic Voice XML document over client module and server module of web server. The proposed voice message system was test-deployed in one domestic voice message application service provider and it is shown that proposed voice message system reduced the response latency problem of test-bed voice message system.

  • PDF

A use of songs for Teaching English Pronunciation in Elementary School

  • Hong, Kyung-Suk
    • Proceedings of the KSPS conference
    • /
    • 2000.07a
    • /
    • pp.105-116
    • /
    • 2000
  • How to teach intelligible, communicative pronunciation is a continuous question in the English education. Without good input, we can not expect good output. However, in EFL situation, it is very difficult to input the good English pronunciation, therefore, we have to find out the efficient and effective material for teaching pronunciation. One of the materials is song, because songs contain the linguistic and cultural traits of the language. The purpose of this paper is to clarify the reason why songs are good for teaching pronunciation. Koreans, who are syllable timed language users, have difficulties in English pronunciation of stress, rhythm, consonants cluster, linking or blending in connected speech. The 134 songs from wee sing are analyzed for how these traits show in songs. The result shows that we can acquire the traits easily and naturally through songs. And a lesson plan is offered as an example for teaching songs.

  • PDF

The Recognition Experiment of Korean Connected Digit in the Telephone Network (전화망에서의 한국어 연속숫자음 인식 실험)

  • Kang Jeom-Ja;Kim Kap-kee
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • spring
    • /
    • pp.167-170
    • /
    • 2002
  • 본 논문에서는 전화망 환경에서의 한국어 숫자음 인식을 위한 특징 파라미터 추출, 음향 모델링 방식을 결정하기 위하여 HTK 툴을 사용한 4 연숫자음 인식실험 결과를 기술한다. 또한, 실험 결과를 토대로 빈번하게 발생하는 숫자음에 대해서 오류율을 분석하였다. 숫자 모델로는 left context biword 모델과 triword 모델을 사용하였으며, 상태수와 mixture 수를 바꾸어 인식 실험을 수행한 결과, triword 모델이 biword 모델보다 인식율이 높은 것으로 나타났으며, substitution 에러율은 " 이<->" 에서 가장 높은 에러가 발생하는 결과를 얻을 수 있다.

  • PDF

A comparison of the absolute error of estimated speaking fundamental frequency (AEF0) among etiological groups of voice disorders (음성장애의 병인 집단 간 추정 발화 기본주파수 절대 오차 비교)

  • Seung Jin Lee;Jae-Yol Lim;Jaeock Kim
    • Phonetics and Speech Sciences
    • /
    • v.15 no.4
    • /
    • pp.53-60
    • /
    • 2023
  • This study compared the absolute error of estimated fundamental frequency (AEF0) using voice - (VRP) and speech range profile (SRP) tasks across various etiological groups with voice disorders. Additionally, we explored the association between AEF0 and related voice parameters within each specific etiological group. The participants included 120 individuals, comprising 30 each from the functional (FUNC), organic (ORGAN), and eurological (NEUR) voice disorder groups, and a normal control group (NC). Each participant performed voice and SRP tasks, and the fundamental frequency of connected speech was measured using electroglottography (EGG). When comparing the AEF0 measures across the etiological groups, there were no differences in Grade and Severity among the patients. However, variations were observed in AEF0VRP and AEF0SUM. Specifically, AEF0VRP was higher in the ORGAN group than in the FUNC and NC groups, whereas AEF0SUM was higher in the ORGAN group than in the NC group. Furthermore, within FUNC and NEUR, AEF0 showed a positive correlation with Grade, while in ORGAN, it exhibited a positive correlation with the mean closed quotient (CQ). Attention should be paid to the application of AEF0 measures and related voice variables based on the etiological group. This study provides foundational information for the clinical application of AEF0 measures.

Text Extraction from Complex Natural Images

  • Kumar, Manoj;Lee, Guee-Sang
    • International Journal of Contents
    • /
    • v.6 no.2
    • /
    • pp.1-5
    • /
    • 2010
  • The rapid growth in communication technology has led to the development of effective ways of sharing ideas and information in the form of speech and images. Understanding this information has become an important research issue and drawn the attention of many researchers. Text in a digital image contains much important information regarding the scene. Detecting and extracting this text is a difficult task and has many challenging issues. The main challenges in extracting text from natural scene images are the variation in the font size, alignment of text, font colors, illumination changes, and reflections in the images. In this paper, we propose a connected component based method to automatically detect the text region in natural images. Since text regions in mages contain mostly repetitions of vertical strokes, we try to find a pattern of closely packed vertical edges. Once the group of edges is found, the neighboring vertical edges are connected to each other. Connected regions whose geometric features lie outside of the valid specifications are considered as outliers and eliminated. The proposed method is more effective than the existing methods for slanted or curved characters. The experimental results are given for the validation of our approach.

Self-Adaptation Algorithm Based on Maximum A Posteriori Eigenvoice for Korean Connected Digit Recognition (한국어 연결 숫자음 인식을 일한 최대 사후 Eigenvoice에 근거한 자기적응 기법)

  • Kim Dong Kook;Jeon Hyung Bae
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.8
    • /
    • pp.590-596
    • /
    • 2004
  • This paper Presents a new self-adaptation algorithm based on maximum a posteriori (MAP) eigenvoice for Korean connected digit recognition. The proposed MAP eigenvoice is developed by introducing a probability density model for the eigenvoice coefficients. The Proposed approach provides a unified framework that incorporates the Prior model into the conventional eigenvoice estimation. In self-adaptation system we use only one adaptation utterance that will be recognized, we use MAP eigenvoice that is most robust adaptation. In series of self-adaptation experiments on the Korean connected digit recognition task. we demonstrate that the performance of the proposed approach is better than that of the conventional eigenvoice algorithm for a small amount of adaptation data.

A Study on a Non-Voice Section Detection Model among Speech Signals using CNN Algorithm (CNN(Convolutional Neural Network) 알고리즘을 활용한 음성신호 중 비음성 구간 탐지 모델 연구)

  • Lee, Hoo-Young
    • Journal of Convergence for Information Technology
    • /
    • v.11 no.6
    • /
    • pp.33-39
    • /
    • 2021
  • Speech recognition technology is being combined with deep learning and is developing at a rapid pace. In particular, voice recognition services are connected to various devices such as artificial intelligence speakers, vehicle voice recognition, and smartphones, and voice recognition technology is being used in various places, not in specific areas of the industry. In this situation, research to meet high expectations for the technology is also being actively conducted. Among them, in the field of natural language processing (NLP), there is a need for research in the field of removing ambient noise or unnecessary voice signals that have a great influence on the speech recognition recognition rate. Many domestic and foreign companies are already using the latest AI technology for such research. Among them, research using a convolutional neural network algorithm (CNN) is being actively conducted. The purpose of this study is to determine the non-voice section from the user's speech section through the convolutional neural network. It collects the voice files (wav) of 5 speakers to generate learning data, and utilizes the convolutional neural network to determine the speech section and the non-voice section. A classification model for discriminating speech sections was created. Afterwards, an experiment was conducted to detect the non-speech section through the generated model, and as a result, an accuracy of 94% was obtained.