Search | Korea Science

Korean Phoneme Recognition Model with Deep CNN (Deep CNN 기반의 한국어 음소 인식 모델 연구)

Hong, Yoon Seok;Ki, Kyung Seo;Gweon, Gahgene
- Proceedings of the Korea Information Processing Society Conference
- /
- 2018.05a
- /
- pp.398-401
- /
- 2018
본 연구에서는 심충 합성곱 신경망(Deep CNN)과 Connectionist Temporal Classification (CTC) 알고리즘을 사용하여 강제정렬 (force-alignment)이 이루어진 코퍼스 없이도 학습이 가능한 음소 인식 모델을 제안한다. 최근 해외에서는 순환 신경망(RNN)과 CTC 알고리즘을 사용한 딥 러닝 기반의 음소 인식 모델이 활발히 연구되고 있다. 하지만 한국어 음소 인식에는 HMM-GMM 이나 인공 신경망과 HMM 을 결합한 하이브리드 시스템이 주로 사용되어 왔으며, 이 방법 은 최근의 해외 연구 사례들보다 성능 개선의 여지가 적고 전문가가 제작한 강제정렬 코퍼스 없이는 학습이 불가능하다는 단점이 있다. 또한 RNN 은 학습 데이터가 많이 필요하고 학습이 까다롭다는 단점이 있어, 코퍼스가 부족하고 기반 연구가 활발하게 이루어지지 않은 한국어의 경우 사용에 제약이 있다. 이에 본 연구에서는 강제정렬 코퍼스를 필요로 하지 않는 CTC 알고리즘을 도입함과 동시에, RNN 에 비해 더 학습 속도가 빠르고 더 적은 데이터로도 학습이 가능한 합성곱 신경망(CNN)을 사용하여 딥 러닝 모델을 구축하여 한국어 음소 인식을 수행하여 보고자 하였다. 이 모델을 통해 본 연구에서는 한국어에 존재하는 49 가지의 음소를 추출하는 세 종류의 음소 인식기를 제작하였으며, 최종적으로 선정된 음소 인식 모델의 PER(phoneme Error Rate)은 9.44 로 나타났다. 선행 연구 사례와 간접적으로 비교하였을 때, 이 결과는 제안하는 모델이 기존 연구 사례와 대등하거나 조금 더 나은 성능을 보인다고 할 수 있다.
https://doi.org/10.3745/PKIPS.y2018m05a.398 인용 PDF

Hyperparameter experiments on end-to-end automatic speech recognition

Yang, Hyungwon;Nam, Hosung
- Phonetics and Speech Sciences
- /
- v.13 no.1
- /
- pp.45-51
- /
- 2021
End-to-end (E2E) automatic speech recognition (ASR) has achieved promising performance gains with the introduced self-attention network, Transformer. However, due to training time and the number of hyperparameters, finding the optimal hyperparameter set is computationally expensive. This paper investigates the impact of hyperparameters in the Transformer network to answer two questions: which hyperparameter plays a critical role in the task performance and training speed. The Transformer network for training has two encoder and decoder networks combined with Connectionist Temporal Classification (CTC). We have trained the model with Wall Street Journal (WSJ) SI-284 and tested on devl93 and eval92. Seventeen hyperparameters were selected from the ESPnet training configuration, and varying ranges of values were used for experiments. The result shows that "num blocks" and "linear units" hyperparameters in the encoder and decoder networks reduce Word Error Rate (WER) significantly. However, performance gain is more prominent when they are altered in the encoder network. Training duration also linearly increased as "num blocks" and "linear units" hyperparameters' values grow. Based on the experimental results, we collected the optimal values from each hyperparameter and reduced the WER up to 2.9/1.9 from dev93 and eval93 respectively.
https://doi.org/10.13064/KSSS.2021.13.1.045 인용 PDF KSCI

ON THE STRUCTURE AND LEARNING OF NEURAL-NETWORK-BASED FUZZY LOGIC CONTROL SYSTEMS

C.T. Lin;Lee, C.S. George
- Proceedings of the Korean Institute of Intelligent Systems Conference
- /
- 1993.06a
- /
- pp.993-996
- /
- 1993
This paper addresses the structure and its associated learning algorithms of a feedforward multi-layered connectionist network, which has distributed learning abilities, for realizing the basic elements and functions of a traditional fuzzy logic controller. The proposed neural-network-based fuzzy logic control system (NN-FLCS) can be contrasted with the traditional fuzzy logic control system in their network structure and learning ability. An on-line supervised structure/parameter learning algorithm dynamic learning algorithm can find proper fuzzy logic rules, membership functions, and the size of output fuzzy partitions simultaneously. Next, a Reinforcement Neural-Network-Based Fuzzy Logic Control System (RNN-FLCS) is proposed which consists of two closely integrated Neural-Network-Based Fuzzy Logic Controllers (NN-FLCS) for solving various reinforcement learning problems in fuzzy logic systems. One NN-FLC functions as a fuzzy predictor and the other as a fuzzy controller. As ociated with the proposed RNN-FLCS is the reinforcement structure/parameter learning algorithm which dynamically determines the proper network size, connections, and parameters of the RNN-FLCS through an external reinforcement signal. Furthermore, learning can proceed even in the period without any external reinforcement feedback.
PDF

Turing's Cognitive Science: A Metamathematical Essay for His Centennial (튜링의 인지과학: 튜링 탄생 백주년을 기념하는 메타수학 에세이)

Hyun, Woo-Sik
- Korean Journal of Cognitive Science
- /
- v.23 no.3
- /
- pp.367-388
- /
- 2012
The centennial of Alan Mathison Turing(23 June 1912 - 7 June 1954) is an appropriate occasion on which to assess his profound influence on the development of cognitive science. His contributions to and attitudes toward that field are discussed from the metamathematical perspective. This essay addresses (i)Turing's mathematical analysis of cognition, (ii)universal Turing machines, (iii)the limitations of universal Turing machines, (iv)oracle Turing machine beyond universal Turing machine, and (v)Turing test for cognitive science. Turing was a ground-breaker, eager to move on to new fields. He actually opened wider the scientific windows to the mind. The results show that first, by means of mathematical logic Turing discovered a new bridge between the mind and the physical world. Second, Turing gave a new formal analysis of operations of the mind. Third, Turing investigated oracle Turing machines and connectionist network machines as new models of minds beyond the limitations of his own universal machines. This paper explores why the cognitive scientist would be ever expecting a new Turing Test on the shoulder of Alan Turing.
PDF

Toward a Possibility of the Unified Model of Cognition (통합적 인지 모형의 가능성)

Rhee Young-Eui
- Journal of Science and Technology Studies
- /
- v.1 no.2 s.2
- /
- pp.399-422
- /
- 2001
Models for human cognition currently discussed in cognitive science cannot be appropriate ones. The symbolic model of the traditional artificial intelligence works for reasoning and problem-solving tasks, but doesn't fit for pattern recognition such as letter/sound cognition. Connectionism shows the contrary phenomena to those of the traditional artificial intelligence. Connectionist systems has been shown to be very strong in the tasks of pattern recognition but weak in most of logical tasks. Brooks' situated action theory denies the. notion of representation which is presupposed in both the traditional artificial intelligence and connectionism and suggests a subsumption model which is based on perceptions coming from real world. However, situated action theory hasn't also been well applied to human cognition so far. In emphasizing those characteristics of models I refer those models 'left-brain model', 'right-brain model', and 'robot model' respectively. After I examine those models in terms of substantial items of cognitions- mental state, mental procedure, basic element of cognition, rule of cognition, appropriate level of analysis, architecture of cognition, I draw three arguments of embodiment. I suggest a way of unifying those existing models by examining their theoretical compatability which is found in those arguments.
PDF

Face Recognition based on Hybrid Classifiers with Virtual Samples (가상 데이터와 융합 분류기에 기반한 얼굴인식)

류연식;오세영
- Journal of the Institute of Electronics Engineers of Korea CI
- /
- v.40 no.1
- /
- pp.19-29
- /
- 2003
This paper presents a novel hybrid classifier for face recognition with artificially generated virtual training samples. We utilize both the nearest neighbor approach in feature angle space and a connectionist model to obtain a synergy effect by combining the results of two heterogeneous classifiers. First, a classifier called the nearest feature angle (NFA), based on angular information, finds the most similar feature to the query from a given training set. Second, a classifier has been developed based on the recall of stored frontal projection of the query feature. It uses a frontal recall network (FRN) that finds the most similar frontal one among the stored frontal feature set. For FRN, we used an ensemble neural network consisting of multiple multiplayer perceptrons (MLPs), each of which is trained independently to enhance generalization capability. Further, both classifiers used the virtual training set generated adaptively, according to the spatial distribution of each person's training samples. Finally, the results of the two classifiers are combined to comprise the best matching class, and a corresponding similarit measure is used to make the final decision. The proposed classifier achieved an average classification rate of 96.33% against a large group of different test sets of images, and its average error rate is 61.5% that of the nearest feature line (NFL) method, and achieves a more robust classification performance.
PDF KSCI

Search Result 46, Processing Time 0.022 seconds

Korean Phoneme Recognition Model with Deep CNN (Deep CNN 기반의 한국어 음소 인식 모델 연구)

Hyperparameter experiments on end-to-end automatic speech recognition

ON THE STRUCTURE AND LEARNING OF NEURAL-NETWORK-BASED FUZZY LOGIC CONTROL SYSTEMS

Turing's Cognitive Science: A Metamathematical Essay for His Centennial (튜링의 인지과학: 튜링 탄생 백주년을 기념하는 메타수학 에세이)

Toward a Possibility of the Unified Model of Cognition (통합적 인지 모형의 가능성)

Face Recognition based on Hybrid Classifiers with Virtual Samples (가상 데이터와 융합 분류기에 기반한 얼굴인식)

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)