Search | Korea Science

Voice Dialing system using Stochastic Matching (확률적 매칭을 사용한 음성 다이얼링 시스템)

김원구
- Proceedings of the Korean Institute of Intelligent Systems Conference
- /
- 2004.04a
- /
- pp.515-518
- /
- 2004
This paper presents a method that improves the performance of the personal voice dialling system in which speaker Independent phoneme HMM's are used. Since the speaker independent phoneme HMM based voice dialing system uses only the phone transcription of the input sentence, the storage space could be reduced greatly. However, the performance of the system is worse than that of the system which uses the speaker dependent models due to the phone recognition errors generated when the speaker Independent models are used. In order to solve this problem, a new method that jointly estimates transformation vectors for the speaker adaptation and transcriptions from training utterances is presented. The biases and transcriptions are estimated iteratively from the training data of each user with maximum likelihood approach to the stochastic matching using speaker-independent phone models. Experimental result shows that the proposed method is superior to the conventional method which used transcriptions only.
PDF

A Study on Consonant/Vowel/Unvoiced Consonant Phonetic Value Segmentation and Recognition of Korean Isolated Word Speech (한국어 고립 단어 음성의 자음/모음/유성자음 음가 분할 및 인식에 관한 연구)

Lee, Jun-Hwan;Lee, Sang-Beom
- The Transactions of the Korea Information Processing Society
- /
- v.7 no.6
- /
- pp.1964-1972
- /
- 2000
For the Korean language, on acoustics, it creates a different form of phonetic value not a phoneme by its own peculiar property. Therefore, the construction of extended recognition system for understanding Korean language should be created with a study of the Korean rule-based system, before it can be used as post-processing of the Korean recognition system. In this paper, text-based Korean rule-based system featuring Korean peculiar vocal sound changing rule is constructed. and based on the text-based phonetic value result of the system constructed, a preliminary phonetic value segmentation border points with non-uniform blocks are extracted in Korean isolated word speech. Through the way of merge and recognition of the non-uniform blocks between the extracted border points, recognition possibility of Korean voice as the form of the phonetic vale has been investigated.
PDF

Emotion Recognition Implementation with Multimodalities of Face, Voice and EEG

Udurume, Miracle;Caliwag, Angela;Lim, Wansu;Kim, Gwigon
- Journal of information and communication convergence engineering
- /
- v.20 no.3
- /
- pp.174-180
- /
- 2022
Emotion recognition is an essential component of complete interaction between human and machine. The issues related to emotion recognition are a result of the different types of emotions expressed in several forms such as visual, sound, and physiological signal. Recent advancements in the field show that combined modalities, such as visual, voice and electroencephalography signals, lead to better result compared to the use of single modalities separately. Previous studies have explored the use of multiple modalities for accurate predictions of emotion; however the number of studies regarding real-time implementation is limited because of the difficulty in simultaneously implementing multiple modalities of emotion recognition. In this study, we proposed an emotion recognition system for real-time emotion recognition implementation. Our model was built with a multithreading block that enables the implementation of each modality using separate threads for continuous synchronization. First, we separately achieved emotion recognition for each modality before enabling the use of the multithreaded system. To verify the correctness of the results, we compared the performance accuracy of unimodal and multimodal emotion recognitions in real-time. The experimental results showed real-time user emotion recognition of the proposed model. In addition, the effectiveness of the multimodalities for emotion recognition was observed. Our multimodal model was able to obtain an accuracy of 80.1% as compared to the unimodality, which obtained accuracies of 70.9, 54.3, and 63.1%.
https://doi.org/10.56977/jicce.2022.20.3.174 인용 PDF KSCI

A Study on Phonetic Value - Transcription Look-Up Table Generation for Postprocessing of Voice Recognition (음성인식 후처리를 위한 음가-표기 변환표 생성에 관한 연구)

김경징;최영규;이상범
- Journal of the Korea Computer Industry Society
- /
- v.3 no.5
- /
- pp.585-594
- /
- 2002
This paper, describes about creation and implementation of phonetic value- transcription conversion table for postprocessing of the voice recognition. Transcription set generator, which produces transcription set that is pronounced as recognized phonetic value, is designed and implemented to postprocess for the voice recognition system which recognizes syllable unit phonetic value Phonetic value-transcription conversion table is produced with transcription-phonetic value conversion table produced by modeling standard pronunciation on petrinet. To show that phonetic value-transcription conversion table produces correct transcription set, transcription set generator is designed and implemented. This paper proves that correct transcription set is produced, which is including pre-vocalization transcription as a result of experimenting standard pronunciation examples and the words randomly sampled from pronunciation dictionary.
PDF

A Study on the Practical Methodology of Engineering Education through the Making of Smart Mirror (스마트 거울의 제작을 통해 이루어진 공학 교육 실천 방법론에 관한 연구)

Seo, Myeong-Deok;Kwon, Ji-Young;Chang, Eun-Young
- Journal of Practical Engineering Education
- /
- v.10 no.1
- /
- pp.9-15
- /
- 2018
A digital signage is constructed using a speech recognition based API, and VRSM (Voice Recognition Smart Mirror) that obtains information such as weather, map, exercise information, schedule, and image by user's voice command so as to be different from other commercialized products is proposed. This course provides an effective method of engineering education through the process of being evaluated as the result of independent graduation certification system, and also it had been the opportunity to design and produce works for 3 semesters by 2 students one group in the majors. Through the comprehensive capstone design, it has experienced engineering approach and creative thinking opportunity. We have won the best academic prize by participating in the academic conferences of the institute about the interim result, and obtained the results of the prize contest in other academic conferences. The improvement in practical skills obtained through this process proved to be beneficial for self-confidence and job-seeking opportunities through actual employment.
https://doi.org/10.14702/JPEE.2018.009 인용 PDF KSCI HTML

English Conversation System Using Artificial Intelligent of based on Virtual Reality (가상현실 기반의 인공지능 영어회화 시스템)

Cheon, EunYoung
- Journal of the Korea Convergence Society
- /
- v.10 no.11
- /
- pp.55-61
- /
- 2019
In order to realize foreign language education, various existing educational media have been provided, but there are disadvantages in that the cost of the parish and the media program is high and the real-time responsiveness is poor. In this paper, we propose an artificial intelligence English conversation system based on VR and speech recognition. We used Google CardBoard VR and Google Speech API to build the system and developed artificial intelligence algorithms for providing virtual reality environment and talking. In the proposed speech recognition server system, the sentences spoken by the user can be divided into word units and compared with the data words stored in the database to provide the highest probability. Users can communicate with and respond to people in virtual reality. The function provided by the conversation is independent of the contextual conversations and themes, and the conversations with the AI assistant are implemented in real time so that the user system can be checked in real time. It is expected to contribute to the expansion of virtual education contents service related to the Fourth Industrial Revolution through the system combining the virtual reality and the voice recognition function proposed in this paper.
https://doi.org/10.15207/JKCS.2019.10.11.055 인용 PDF KSCI

Intelligent Steering Control System Based on Voice Instructions

Seo, Ki-Yeol;Oh, Se-Woong;Suh, Sang-Hyun;Park, Gyei-Kark
- International Journal of Control, Automation, and Systems
- /
- v.5 no.5
- /
- pp.539-546
- /
- 2007
The important field of research in ship operation is related to the high efficiency of transportation, the convenience of maneuvering ships and the safety of navigation. For these purposes, many intelligent technologies for ship automation have been required and studied. In this paper, we propose an intelligent voice instruction-based learning (VIBL) method and discuss the building of a ship's steering control system based on this method. The VIBL system concretely consists of two functions: a text conversion function where an instructor's inputted voice is recognized and converted to text, and a linguistic instruction based learning function where the text instruction is understood through a searching process of given meaning elements. As a study method, the fuzzy theory is adopted to build maneuvering models of steersmen and then the existing LIBL is improved and combined with the voice recognition technology to propose the VIBL. The ship steering control system combined with VIBL is tested in a ship maneuvering simulator and its validity is shown.
PDF KSCI

Noise Elimination Using Improved MFCC and Gaussian Noise Deviation Estimation

Sang-Yeob, Oh
- Journal of the Korea Society of Computer and Information
- /
- v.28 no.1
- /
- pp.87-92
- /
- 2023
With the continuous development of the speech recognition system, the recognition rate for speech has developed rapidly, but it has a disadvantage in that it cannot accurately recognize the voice due to the noise generated by mixing various voices with the noise in the use environment. In order to increase the vocabulary recognition rate when processing speech with environmental noise, noise must be removed. Even in the existing HMM, CHMM, GMM, and DNN applied with AI models, unexpected noise occurs or quantization noise is basically added to the digital signal. When this happens, the source signal is altered or corrupted, which lowers the recognition rate. To solve this problem, each voice In order to efficiently extract the features of the speech signal for the frame, the MFCC was improved and processed. To remove the noise from the speech signal, the noise removal method using the Gaussian model applied noise deviation estimation was improved and applied. The performance evaluation of the proposed model was processed using a cross-correlation coefficient to evaluate the accuracy of speech. As a result of evaluating the recognition rate of the proposed method, it was confirmed that the difference in the average value of the correlation coefficient was improved by 0.53 dB.
https://doi.org/10.9708/jksci.2023.28.01.087 인용 PDF HTML

An Implementation of User Identification System Using Hrbrid Biomitic Distances (복합 생체 척도 거리를 이용한 사용자 인증시스템의 구현)

주동현;김두영
- Journal of the Institute of Convergence Signal Processing
- /
- v.3 no.2
- /
- pp.23-29
- /
- 2002
In this paper we proposed the user identification system using hybrid biometric information and non-contact IC card to improve the accuracy of the system. The hybrid biometric information consists of the face image, the iris image, and the 4-digit voice password of user. And the non-contact IC card provides the base information of user If the distance between the sample hybrid biometric Information corresponding to the base information of user and the measured biometric information is less than the given threshold value, the identification is accepted. Otherwise it is rejected. Through the result of experimentation, this paper shows that the proposed method has better identification rate than the conventional identification method.
PDF

Ortho-phonic Alphabet Creation by the Musical Theory and its Segmental Algorithm (악리론으로 본 정음창제와 정음소 분절 알고리즘)

Chin, Yong-Ohk;Ahn, Cheong-Keung
- Speech Sciences
- /
- v.8 no.2
- /
- pp.49-59
- /
- 2001
The phoneme segmentation is a very difficult problem in speech sound processing because it has found out segmental algorithm in many kinds of allophone and coarticulation's trees. Thus system configuration for the speech recognition and voice retrieval processing has a complex system structure. To solve it, we discuss a possibility of new segmental algorithm, which is called the minus a thirds one or plus in tripartitioning(삼분손익) of twelve temporament(12 율려), first proposed by Prof. T. S. Han. It is close to oriental and western musical theory. He also has suggested a 3 consonant and 3 vowel phonemes in Hunminjungum(훈민정음) invented by the King Sejong in the 15th century. In this paper, we suggest to newly name it as ortho-phonic phoneme(OPP/정음소), which carries the meaning of 'the absoluteness and independency'. OPP also is acceptable to any other languages, for example IPA. Lastly we know that this algorithm is constantly applicable to the global language and is very useful to construct a voice recognition and retrieval structuring engineering.
PDF

Search Result 334, Processing Time 0.028 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)