• Title/Summary/Keyword: Speech Processing

Search Result 960, Processing Time 0.036 seconds

Recognition of Noise Quantity by Linear Predictive Coefficient of Speech Signal (음성신호의 선형예측계수에 의한 잡음량의 인식)

  • Choi, Jae-Seung
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.46 no.2
    • /
    • pp.120-126
    • /
    • 2009
  • In order to reduce the noise quantity in a conversation under the noisy environment it is necessary for the signal processing system to process adaptively according to the noise quantity in order to enhance the performance. Therefore this paper presents a recognition method for noise quantity by linear predictive coefficient using a three layered neural network, which is trained using three kinds of speech that is degraded by various background noises. The performance of the proposed method for the noise quantity was evaluated based on the recognition rates for various noises. In the experiment, the average values of the recognition results were 98.4% or more for such noise using Aurora2 database.

Differential Effect for Neural Activation Processes according to the Proficiency Level of Code Switching: An ERP Study (이중언어환경에서의 언어간 부호전환 수준에 따른 차별적 신경활성화 과정: ERP연구)

  • Kim, Choong-Myung
    • Phonetics and Speech Sciences
    • /
    • v.2 no.4
    • /
    • pp.3-10
    • /
    • 2010
  • The present study aims to investigate neural activations according to the level of code switching in English proficient bilinguals and to find the relationship between the performance of language switching and proficiency level using ERPs (event-related potentials). First, when comparing high-proficient (HP) with low-proficient (LP) bilingual performance in a native language environment, the activation level of N2 was observed to be higher in the HP group than in the LP group, but only under two conditions: 1) the language switching (between-language) condition known as indexing attention of code switching and 2) the inhibition of current language for L1. Another effect of N400 can be shown in both groups only in the language non-switching (within-language) condition. This effect suggests that both groups completed the semantic acceptability task well in their native language environment without the burden of language switching, irrespective of high or low performance. The latencies of N400 are only about 100ms earlier in the HP group than in the LP group. This difference can be interpreted as facilitation of the given task. These results suggest that HP showed the differential activation in inhibitory system for L1 in switching condition of L1-to-L2 to be contrary to inactivation of inhibitory system for the LP group. Despite the absence of an N400 effect at the given task in both groups, differential latencies between the peaks were attributed to the differences of efficiency in semantic processing.

  • PDF

Design and implement of the Educational Humanoid Robot D2 for Emotional Interaction System (감성 상호작용을 갖는 교육용 휴머노이드 로봇 D2 개발)

  • Kim, Do-Woo;Chung, Ki-Chull;Park, Won-Sung
    • Proceedings of the KIEE Conference
    • /
    • 2007.07a
    • /
    • pp.1777-1778
    • /
    • 2007
  • In this paper, We design and implement a humanoid robot, With Educational purpose, which can collaborate and communicate with human. We present an affective human-robot communication system for a humanoid robot, D2, which we designed to communicate with a human through dialogue. D2 communicates with humans by understanding and expressing emotion using facial expressions, voice, gestures and posture. Interaction between a human and a robot is made possible through our affective communication framework. The framework enables a robot to catch the emotional status of the user and to respond appropriately. As a result, the robot can engage in a natural dialogue with a human. According to the aim to be interacted with a human for voice, gestures and posture, the developed Educational humanoid robot consists of upper body, two arms, wheeled mobile platform and control hardware including vision and speech capability and various control boards such as motion control boards, signal processing board proceeding several types of sensors. Using the Educational humanoid robot D2, we have presented the successful demonstrations which consist of manipulation task with two arms, tracking objects using the vision system, and communication with human by the emotional interface, the synthesized speeches, and the recognition of speech commands.

  • PDF

Improving transformer-based acoustic model performance using sequence discriminative training (Sequence dicriminative training 기법을 사용한 트랜스포머 기반 음향 모델 성능 향상)

  • Lee, Chae-Won;Chang, Joon-Hyuk
    • The Journal of the Acoustical Society of Korea
    • /
    • v.41 no.3
    • /
    • pp.335-341
    • /
    • 2022
  • In this paper, we adopt a transformer that shows remarkable performance in natural language processing as an acoustic model of hybrid speech recognition. The transformer acoustic model uses attention structures to process sequential data and shows high performance with low computational cost. This paper proposes a method to improve the performance of transformer AM by applying each of the four algorithms of sequence discriminative training, a weighted finite-state transducer (wFST)-based learning used in the existing DNN-HMM model. In addition, compared to the Cross Entropy (CE) learning method, sequence discriminative method shows 5 % of the relative Word Error Rate (WER).

A Study-on Context-Dependent Acoustic Models to Improve the Performance of the Korea Speech Recognition (한국어 음성인식 성능향상을 위한 문맥의존 음향모델에 관한 연구)

  • 황철준;오세진;김범국;정호열;정현열
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.2 no.4
    • /
    • pp.9-15
    • /
    • 2001
  • In this paper we investigate context dependent acoustic models to improve the performance of the Korean speech recognition . The algorithm are using the Korean phonological rules and decision tree, By Successive State Splitting(SSS) algorithm the Hidden Merkov Netwwork(HM-Net) which is an efficient representation of phoneme-context-dependent HMMs, can be generated automatically SSS is powerful technique to design topologies of tied-state HMMs but it doesn't treat unknown contexts in the training phoneme contexts environment adequately In addition it has some problem in the procedure of the contextual domain. In this paper we adopt a new state-clustering algorithm of SSS, called Phonetic Decision Tree-based SSS (PDT-SSS) which includes contexts splits based on the Korean phonological rules. This method combines advantages of both the decision tree clustering and SSS, and can generated highly accurate HM-Net that can express any contexts To verify the effectiveness of the adopted methods. the experiments are carried out using KLE 452 word database and YNU 200 sentence database. Through the Korean phoneme word and sentence recognition experiments. we proved that the new state-clustering algorithm produce better phoneme, word and continuous speech recognition accuracy than the conventional HMMs.

  • PDF

A Method on the Learning Speed Improvement of the Online Error Backpropagation Algorithm in Speech Processing (음성처리에서 온라인 오류역전파 알고리즘의 학습속도 향상방법)

  • 이태승;이백영;황병원
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.5
    • /
    • pp.430-437
    • /
    • 2002
  • Having a variety of good characteristics against other pattern recognition techniques, the multilayer perceptron (MLP) has been widely used in speech recognition and speaker recognition. But, it is known that the error backpropagation (EBP) algorithm that MLP uses in learning has the defect that requires restricts long learning time, and it restricts severely the applications like speaker recognition and speaker adaptation requiring real time processing. Because the learning data for pattern recognition contain high redundancy, in order to increase the learning speed it is very effective to use the online-based learning methods, which update the weight vector of the MLP by the pattern. A typical online EBP algorithm applies the fixed learning rate for each update of the weight vector. Though a large amount of speedup with the online EBP can be obtained by choosing the appropriate fixed rate, firing the rate leads to the problem that the algorithm cannot respond effectively to different learning phases as the phases change and the number of patterns contributing to learning decreases. To solve this problem, this paper proposes a Changing rate and Omitting patterns in Instant Learning (COIL) method to apply the variable rate and the only patterns necessary to the learning phase when the phases come to change. In this paper, experimentations are conducted for speaker verification and speech recognition, and results are presented to verify the performance of the COIL.

Development of AI-based Real Time Agent Advisor System on Call Center - Focused on N Bank Call Center (AI기반 콜센터 실시간 상담 도우미 시스템 개발 - N은행 콜센터 사례를 중심으로)

  • Ryu, Ki-Dong;Park, Jong-Pil;Kim, Young-min;Lee, Dong-Hoon;Kim, Woo-Je
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.20 no.2
    • /
    • pp.750-762
    • /
    • 2019
  • The importance of the call center as a contact point for the enterprise is growing. However, call centers have difficulty with their operating agents due to the agents' lack of knowledge and owing to frequent agent turnover due to downturns in the business, which causes deterioration in the quality of customer service. Therefore, through an N-bank call center case study, we developed a system to reduce the burden of keeping up business knowledge and to improve customer service quality. It is a "real-time agent advisor" system that provides agents with answers to customer questions in real time by combining AI technology for speech recognition, natural language processing, and questions & answers for existing call center information systems, such as a private branch exchange (PBX) and computer telephony integration (CTI). As a result of the case study, we confirmed that the speech recognition system for real-time call analysis and the corpus construction method improves the natural speech processing performance of the query response system. Especially with name entity recognition (NER), the accuracy of the corpus learning improved by 31%. Also, after applying the agent advisor system, the positive feedback rate of agents about the answers from the agent advisor was 93.1%, which proved the system is helpful to the agents.

Phonological retrieval and phonological memory skills in children with dyslexia and poor comprehension (난독증 아동과 읽기이해부진 아동의 음운인출과 음운기억 능력)

  • Hyojin Yoon
    • Phonetics and Speech Sciences
    • /
    • v.16 no.2
    • /
    • pp.83-90
    • /
    • 2024
  • This study aimed to explore phonological retrieval and phonological memory skills in second to third graders with dyslexia, poor comprehension, and typical development. The participants included 17 children with dyslexia, 17 children with poor comprehension, and 24 typically developing children. Children with dyslexia scored below 85 on the word decoding test, poor comprehender scored above 90 on the word decoding, and below 85 on the reading comprehension test and typical children scored above 90 on both reading tests. All participants were assessed on rapid automatized naming (RAN) and nonword repetition (NWR). The result indicated that children with dyslexia performed significantly worse on RAN and NWR tasks than other groups. However, there was significant differences between poor comprehender and typically developing children. Furthermore, only RAN were significantly correlated with word decoding and reading comprehension in children with dyslexia. For typically developing children, RAN was correlated with word decoding and reading comprehension, while NWR had a significant correlation with reading comprehension. No correlations were found between these variables for poor comprehender. The finding suggests that children with dyslexia showed difficulties on phonological retrieval and phonological memory, which are essential for reading development while poor comprehender do not have difficulties with phonological processing skills. Phonological processing deficits may underlie word decoding difficulties in dyslexia.

fast running FIR filter structure based on Wavelet adaptive algorithm for computational complexity (웨이블렛 기반 적응 알고리즘의 계산량 감소에 적합한 Fast running FIR filter에 관한 연구)

  • Lee, Jae-Kyun;Lee, Chae-Wook
    • Proceedings of the Korea Institute of Convergence Signal Processing
    • /
    • 2005.11a
    • /
    • pp.250-255
    • /
    • 2005
  • In this paper, we propose a new fast running FIR filter structure that improves the convergence speed of adaptive signal processing and reduces the computational complexity. The proposed filter is applied to wavelet based adaptive algorithm. Actually we compared the performance of the proposed algorithm with other algorithm using computer simulation of adaptive noise canceler based on synthesis speech. As the result, the frequency domain algorithm is prefer than the existent time domain. we analyzed the Wavelet algorithm, short-length fast running FIR algorithm, fast-short-length fast running FIR algorithm and proposed algorithm.

  • PDF

A Collaborative Framework for Discovering the Organizational Structure of Social Networks Using NER Based on NLP (NLP기반 NER을 이용해 소셜 네트워크의 조직 구조 탐색을 위한 협력 프레임 워크)

  • Elijorde, Frank I.;Yang, Hyun-Ho;Lee, Jae-Wan
    • Journal of Internet Computing and Services
    • /
    • v.13 no.2
    • /
    • pp.99-108
    • /
    • 2012
  • Many methods had been developed to improve the accuracy of extracting information from a vast amount of data. This paper combined a number of natural language processing methods such as NER (named entity recognition), sentence extraction, and part of speech tagging to carry out text analysis. The data source is comprised of texts obtained from the web using a domain-specific data extraction agent. A framework for the extraction of information from unstructured data was developed using the aforementioned natural language processing methods. We simulated the performance of our work in the extraction and analysis of texts for the detection of organizational structures. Simulation shows that our study outperformed other NER classifiers such as MUC and CoNLL on information extraction.