• Title/Summary/Keyword: Speech Recognition Technology

Search Result 523, Processing Time 0.026 seconds

The development of the anomia assessment battery based on the psycholinguistic processing (언어심리학을 기반으로 한 명칭성 실어증 평가도구 개발)

  • Jung, Jae-Bum;Pyun, Sung-Bom;Sohn, Hyo-Jung;Gee, Sung-Woo;Cho, Sung-Ho;Nam, Ki-Chun
    • Proceedings of the KSPS conference
    • /
    • 2007.05a
    • /
    • pp.158-162
    • /
    • 2007
  • Anomia, word finding difficulty, is one of the most common feature in aphasia. Previous studies support that the process of picture naming consists of three stages, in the order of the object recognition, semantic, and phonological output stages. Anomic patients have many symptoms and it means that anomia can be sub-divided into several symptom groups. Our anomia assessment battery consists of several parts: (1) picture naming set, (2) picture-word matching task, (3) lexical decision task for mental lexicon damage, (4) naming task for phonological lexicon damage, and (5) semantic decision task. Pictures and words were selected on the basis of usage frequency, semantic category, and word length. We administered this anomia evaluation battery to many anomic aphasics and we subdivided patients into several groups. We hope that our anomia evaluation set is useful and helpful for evaluation anomic aphasics

  • PDF

ACOUSTIC FEATURES DIFFERENTIATING KOREAN MEDIAL LAX AND TENSE STOPS

  • Shin, Ji-Hye
    • Proceedings of the KSPS conference
    • /
    • 1996.10a
    • /
    • pp.53-69
    • /
    • 1996
  • Much research has been done on the rues differentiating the three Korean stops in word initial position. This paper focuses on a more neglected area: the acoustic cues differentiating the medial tense and lax unaspirated stops. Eight adult Korean native speakers, four males and four females, pronounced sixteen minimal pairs containing the two series of medial stops with different preceding vowel qualities. The average duration of vowels before lax stops is 31 msec longer than before their tense counterparts (70 msec for lax vs 39 msec for tense). In addition, the average duration of the stop closure of tense stops is 135 msec longer than that of lax stops (69 msec for lax vs 204msec for tense). THESE DURATIONAL DIFFERENCES ARE 50 LARGE THAT THEY MAY BE PHONOLOGICALLY DETERMINED, NOT PHONETICALLY. Moreover, vowel duration varies with the speaker's sex. Female speakers have 5 msec shorter vowel duration before both stops. The quality of voicing, tense or lax, is also a cue to these two stop types, as it is in initial position, but the relative duration of the stops appears to be much more important cues. The duration of stops changes the stop perception while that of preceding vowel does not. The consequences of these results for the phonological description of Korean as well as the synthesis and automatic recognition of Korean will be discussed.

  • PDF

Maximum Likelihood-based Automatic Lexicon Generation for AI Assistant-based Interaction with Mobile Devices

  • Lee, Donghyun;Park, Jae-Hyun;Kim, Kwang-Ho;Park, Jeong-Sik;Kim, Ji-Hwan;Jang, Gil-Jin;Park, Unsang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.9
    • /
    • pp.4264-4279
    • /
    • 2017
  • In this paper, maximum likelihood-based automatic lexicon generation using mixed-syllables is proposed for unlimited vocabulary voice interface for East Asian languages (e.g. Korean, Chinese and Japanese) in AI-assistant based interaction with mobile devices. The conventional lexicon has two inevitable problems: 1) a tedious repetition of out-of-lexicon unit additions to the lexicon, and 2) the propagation of errors during a morpheme analysis and space segmentation. The proposed method provides an automatic framework to solve the above problems. The proposed method produces a level of overall accuracy similar to one of previous methods in the presence of one out-of-lexicon word in a sentence, but the proposed method provides superior results with the absolute improvements of 1.62%, 5.58%, and 10.09% in terms of word accuracy when the number of out-of-lexicon words in a sentence was two, three and four, respectively.

Emotional Intelligence System for Ubiquitous Smart Foreign Language Education Based on Neural Mechanism

  • Dai, Weihui;Huang, Shuang;Zhou, Xuan;Yu, Xueer;Ivanovi, Mirjana;Xu, Dongrong
    • Journal of Information Technology Applications and Management
    • /
    • v.21 no.3
    • /
    • pp.65-77
    • /
    • 2014
  • Ubiquitous learning has aroused great interest and is becoming a new way for foreign language education in today's society. However, how to increase the learners' initiative and their community cohesion is still an issue that deserves more profound research and studies. Emotional intelligence can help to detect the learner's emotional reactions online, and therefore stimulate his interest and the willingness to participate by adjusting teaching skills and creating fun experiences in learning. This is, actually the new concept of smart education. Based on the previous research, this paper concluded a neural mechanism model for analyzing the learners' emotional characteristics in ubiquitous environment, and discussed the intelligent monitoring and automatic recognition of emotions from the learners' speech signals as well as their behavior data by multi-agent system. Finally, a framework of emotional intelligence system was proposed concerning the smart foreign language education in ubiquitous learning.

A Study on Finger Language Translation System using Machine Learning and Leap Motion (머신러닝과 립 모션을 활용한 지화 번역 시스템 구현에 관한 연구)

  • Son, Da Eun;Go, Hyeong Min;Shin, Haeng yong
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2019.10a
    • /
    • pp.552-554
    • /
    • 2019
  • Deaf mutism (a hearing-impaired person and speech disorders) communicates using sign language. There are difficulties in communicating by voice. However, sign language can only be limited in communicating with people who know sign language because everyone doesn't use sign language when they communicate. In this paper, a finger language translation system is proposed and implemented as a means for the disabled and the non-disabled to communicate without difficulty. The proposed algorithm recognizes the finger language data by leap motion and self-learns the data using machine learning technology to increase recognition rate. We show performance improvement from the simulation results.

Wavelet-based Algorithm for Signal Reconstruction (신호 복원을 위한 웨이브렛기반 알고리즘)

  • Bae, Sang-Bum;Kim, Nam-Ho
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.11 no.1
    • /
    • pp.150-156
    • /
    • 2007
  • Noise is generated by several causes, when signal is processed. Hence, it generates error in the process of data transmission and decreases recognition ratio of image and speech data. Therefore, after eliminating those noises, a variety of methods for reconstructing the signal have been researched. Recently, wavelet transform which has time-frequency localization and is possible for multiresolution analysis is applied to many fields of technology. Then threshold-and correlation-based methods are proposed for removing noise. But, conventional methods accept a lot of noise as an edge and are impossible to remove the additive white Gaussian noise (AWGN) and the impulse noise at the same time. Therefore, in this paper we proposed new wavelet-based algorithm for reconstructing degraded signal by noise and compared it with conventional methods.

An Optimized e-Lecture Video Search and Indexing framework

  • Medida, Lakshmi Haritha;Ramani, Kasarapu
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.8
    • /
    • pp.87-96
    • /
    • 2021
  • The demand for e-learning through video lectures is rapidly increasing due to its diverse advantages over the traditional learning methods. This led to massive volumes of web-based lecture videos. Indexing and retrieval of a lecture video or a lecture video topic has thus proved to be an exceptionally challenging problem. Many techniques listed by literature were either visual or audio based, but not both. Since the effects of both the visual and audio components are equally important for the content-based indexing and retrieval, the current work is focused on both these components. A framework for automatic topic-based indexing and search depending on the innate content of the lecture videos is presented. The text from the slides is extracted using the proposed Merged Bounding Box (MBB) text detector. The audio component text extraction is done using Google Speech Recognition (GSR) technology. This hybrid approach generates the indexing keywords from the merged transcripts of both the video and audio component extractors. The search within the indexed documents is optimized based on the Naïve Bayes (NB) Classification and K-Means Clustering models. This optimized search retrieves results by searching only the relevant document cluster in the predefined categories and not the whole lecture video corpus. The work is carried out on the dataset generated by assigning categories to the lecture video transcripts gathered from e-learning portals. The performance of search is assessed based on the accuracy and time taken. Further the improved accuracy of the proposed indexing technique is compared with the accepted chain indexing technique.

Machine Learning Based Domain Classification for Korean Dialog System (기계학습을 이용한 한국어 대화시스템 도메인 분류)

  • Jeong, Young-Seob
    • Journal of Convergence for Information Technology
    • /
    • v.9 no.8
    • /
    • pp.1-8
    • /
    • 2019
  • Dialog system is becoming a new dominant interaction way between human and computer. It allows people to be provided with various services through natural language. The dialog system has a common structure of a pipeline consisting of several modules (e.g., speech recognition, natural language understanding, and dialog management). In this paper, we tackle a task of domain classification for the natural language understanding module by employing machine learning models such as convolutional neural network and random forest. For our dataset of seven service domains, we showed that the random forest model achieved the best performance (F1 score 0.97). As a future work, we will keep finding a better approach for domain classification by investigating other machine learning models.

Analysis of the Korean Tokenizing Library Module (한글 토크나이징 라이브러리 모듈 분석)

  • Lee, Jae-kyung;Seo, Jin-beom;Cho, Young-bok
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.05a
    • /
    • pp.78-80
    • /
    • 2021
  • Currently, research on natural language processing (NLP) is rapidly evolving. Natural language processing is a technology that allows computers to analyze the meanings of languages used in everyday life, and is used in various fields such as speech recognition, spelling tests, and text classification. Currently, the most commonly used natural language processing library is NLTK based on English, which has a disadvantage in Korean language processing. Therefore, after introducing KonLPy and Soynlp, the Korean Tokenizing libraries, we will analyze morphology analysis and processing techniques, compare and analyze modules with Soynlp that complement KonLPy's shortcomings, and use them as natural language processing models.

  • PDF

Research on PEFT Feasibility for On-Device Military AI (온 디바이스 국방 AI를 위한 PEFT 효용성 연구)

  • Gi-Min Bae;Hak-Jin Lee;Sei-Ok Kim;Jang-Hyong Lee
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2024.01a
    • /
    • pp.51-54
    • /
    • 2024
  • 본 논문에서는 온 디바이스 국방 AI를 위한 효율적인 학습 방법을 제안한다. 제안하는 방법은 모델 전체를 재학습하는 대신 필요한 부분만 세밀하게 조정하여 계산 비용과 시간을 대폭 줄이는 PEFT 기법의 LoRa를 적용하였다. LoRa는 기존의 신경망 가중치를 직접 수정하지 않고 추가적인 낮은 랭크의 매트릭스를 학습하는 방식으로 기존 모델의 구조를 크게 변경하지 않으면서도, 효율적으로 새로운 작업에 적응할 수 있다. 또한 학습 파라미터 및 연산 입출력에 데이터에 대하여 32비트의 부동소수점(FP32) 대신 부동소수점(FP16, FP8) 또는 정수형(INT8)을 활용하는 경량화 기법인 양자화도 적용하였다. 적용 결과 학습시 요구되는 GPU의 사용량이 32GB에서 5.7GB로 82.19% 감소함을 확인하였다. 동일한 조건에서 동일한 데이터로 모델의 성능을 평가한 결과 동일 학습 횟수에선 LoRa와 양자화가 적용된 모델의 오류가 기본 모델보다 53.34% 증가함을 확인하였다. 모델 성능의 감소를 줄이기 위해서는 학습 횟수를 더 증가시킨 결과 오류 증가율이 29.29%로 동일 학습 횟수보다 더 줄어듬을 확인하였다.

  • PDF