통합 검색 | Korea Science

심층신경망 구조에 따른 구개인두부전증 환자 음성 인식 향상 연구 (A study on recognition improvement of velopharyngeal insufficiency patient's speech using various types of deep neural network)

김민석;정재희;정보경;윤기무;배아라;김우일
- 한국음향학회지
- /
- 제38권6호
- /
- pp.703-709
- /
- 2019
본 논문에서는 구개인두부전증(VeloPharyngeal Insufficiency, VPI) 환자의 음성을 효과적으로 인식하기 위해 컨볼루션 신경망 (Convolutional Neural Network, CNN), 장단기 모델(Long Short Term Memory, LSTM) 구조 신경망을 은닉 마르코프 모델(Hidden Markov Model, HMM)과 결합한 하이브리드 구조의 음성 인식 시스템을 구축하고 모델 적응 기법을 적용하여, 기존 Gaussian Mixture Model(GMM-HMM), 완전 연결형 Deep Neural Network(DNN-HMM) 기반의 음성 인식 시스템과 성능을 비교한다. 정상인 화자가 PBW452단어를 발화한 데이터를 이용하여 초기 모델을 학습하고 정상인 화자의 VPI 모의 음성을 이용하여 화자 적응의 사전 모델을 생성한 후에 VPI 환자들의 음성으로 추가 적응 학습을 진행한다. VPI환자의 화자 적응 시에 CNN-HMM 기반 모델에서는 일부층만 적응 학습하고, LSTM-HMM 기반 모델의 경우에는 드롭 아웃 규제기법을 적용하여 성능을 관찰한 결과 기존 완전 연결형 DNN-HMM 인식기보다 3.68 % 향상된 음성 인식 성능을 나타낸다. 이러한 결과는 본 논문에서 제안하는 LSTM-HMM 기반의 하이브리드 음성 인식 기법이 많은 데이터를 확보하기 어려운 VPI 환자 음성에 대해 보다 향상된 인식률의 음성 인식 시스템을 구축하는데 효과적임을 입증한다.
https://doi.org/10.7776/ASK.2019.38.6.703 인용 PDF KSCI

입술움직임 영상신호를 고려한 음성존재 검출 (Speech Activity Decision with Lip Movement Image Signals)

박준;이영직;김응규;이수종
- 한국음향학회지
- /
- 제26권1호
- /
- pp.25-31
- /
- 2007
본 논문은 음성인식을 위한 음성구간 검출과정에서, 음향에너지 이외에도 화자의 입술움직임 영상신호까지 확인하도록 함으로써, 외부의 음향잡음이 음성인식 대상으로 오인식되는 것을 방지하기 위하여 시도한 것이다. 먼저, PC용 화상카메라를 통하여 영상을 획득하고, 입술움직임 여부가 식별된다. 그리고 입술움직임 영상신호 데이터는 공유메모리에 저장되어 음성인식 프로세스와 공유한다. 한편, 음성인식의 전처리 단계인 음성구간 검출과정에서는 공유메모리에 저장되어 있는 데이터를 확인함으로써 사람의 발성에 의한 음향에너지인지의 여부를 확인하게 된다. 음성인식기와 영상처리기를 연동시켜 실험한 결과, 화상카메라에 대면해서 발성하면 음성인식 결과의 출력까지 정상적으로 진행됨을 확인하였고, 화상카메라에 대면하지 않고 발성하면 음성인식 결과를 출력하지 않는 것을 확인하였다. 이는 음향에너지가 입력되더라도 입술움직임 영상이 확인되지 않으면 음향잡음으로 간주하도록 한 것에 따른 것이다.
https://doi.org/10.7776/ASK.2007.26.1.025 인용 PDF KSCI

MLHF 모델을 적용한 어휘 인식 탐색 최적화 시스템 (Vocabulary Recognition Retrieval Optimized System using MLHF Model)

안찬식;오상엽
- 한국컴퓨터정보학회논문지
- /
- 제14권10호
- /
- pp.217-223
- /
- 2009
모바일 단말기의 어휘 인식 시스템에서는 통계적 방법에 의한 어휘인식을 수행하고 N-gram을 이용한 통계적 문법 인식 시스템을 사용한다. 인식 대상이 되는 어휘의 수가 증가하면 어휘 인식 알고리즘이 복잡해지고 대규모의 탐색공간을 필요로 하게 되며 처리시간이 길어지므로 제한된 연산처리 능력과 메모리로는 처리하기가 불가능하다. 따라서 본 논문에서는 이러한 단점을 개선하고 어휘 인식을 최적화하기 위하여 MLHF 시스템을 제안한다. MLHF는 FLaVoR의 구조를 이용하여 음향학적 탐색과 언어적 탐색을 분리하여 음향학적 탐색에서는 HMM을 사용하고 언어적 탐색 단계에서는 Levenshtein distance 알고리즘을 사용한다. 시스템 성능 평가 결과 어휘 종속 인식률은 98.63%, 어휘 독립 인식률은 97.91%의 인식률을 나타냈으며 인식속도는 1.61초로 나타내었다.
https://doi.org/10.9708/jksci.2009.14.10.217 인용 PDF

대어휘 연속음성인식을 위한 서브네트워크 기반의 1-패스 세미다이나믹 네트워크 디코딩 (1-Pass Semi-Dynamic Network Decoding Using a Subnetwork-Based Representation for Large Vocabulary Continuous Speech Recognition)

정민화;안동훈
- 대한음성학회지:말소리
- /
- 제50호
- /
- pp.51-69
- /
- 2004
In this paper, we present a one-pass semi-dynamic network decoding framework that inherits both advantages of fast decoding speed from static network decoders and memory efficiency from dynamic network decoders. Our method is based on the novel language model network representation that is essentially of finite state machine (FSM). The static network derived from the language model network [1][2] is partitioned into smaller subnetworks which are static by nature or self-structured. The whole network is dynamically managed so that those subnetworks required for decoding are cached in memory. The network is near-minimized by applying the tail-sharing algorithm. Our decoder is evaluated on the 25k-word Korean broadcast news transcription task. In case of the search network itself, the network is reduced by 73.4% from the tail-sharing algorithm. Compared with the equivalent static network decoder, the semi-dynamic network decoder has increased at most 6% in decoding time while it can be flexibly adapted to the various memory configurations, giving the minimal usage of 37.6% of the complete network size.
PDF

Electroencephalography-based imagined speech recognition using deep long short-term memory network

Agarwal, Prabhakar;Kumar, Sandeep
- ETRI Journal
- /
- 제44권4호
- /
- pp.672-685
- /
- 2022
This article proposes a subject-independent application of brain-computer interfacing (BCI). A 32-channel Electroencephalography (EEG) device is used to measure imagined speech (SI) of four words (sos, stop, medicine, washroom) and one phrase (come-here) across 13 subjects. A deep long short-term memory (LSTM) network has been adopted to recognize the above signals in seven EEG frequency bands individually in nine major regions of the brain. The results show a maximum accuracy of 73.56% and a network prediction time (NPT) of 0.14 s which are superior to other state-of-the-art techniques in the literature. Our analysis reveals that the alpha band can recognize SI better than other EEG frequencies. To reinforce our findings, the above work has been compared by models based on the gated recurrent unit (GRU), convolutional neural network (CNN), and six conventional classifiers. The results show that the LSTM model has 46.86% more average accuracy in the alpha band and 74.54% less average NPT than CNN. The maximum accuracy of GRU was 8.34% less than the LSTM network. Deep networks performed better than traditional classifiers.
https://doi.org/10.4218/etrij.2021-0118 인용 PDF KSCI

차량 번호판 목격자의 기억 평가를 위한 사건 관련 전위 연구 (Estimation of Eyewitness Identification Accuracy by Event-Related Potentials)

함근수;표주연;장태익;유성호
- The Korean Journal of Legal Medicine
- /
- 제39권4호
- /
- pp.115-119
- /
- 2015
We investigated event-related potentials (ERPs) to estimate the accuracy of eyewitness memories. Participants watched videos of vehicles being driven dangerously, from an anti-impaired driving initiative. The four-letter license plates of the vehicles were the target stimuli. Random numbers were presented while participants attempted to identify the license plate letters, and electroencephalograms were recorded. There was a significant difference in activity 300-500 milliseconds after stimulus onset, between target stimuli and random numbers. This finding contributes to establishing an eyewitness recognition model where different ERP components may reflect more explicit memory that is dissociable from recollection.
https://doi.org/10.7580/kjlm.2015.39.4.115 인용

트루뷰 동영상 광고의 스킵버튼 종류에 따른 광고 효과 (Influence of TrueView Ad Skip Buttons on Advertising Effect)

김주석;정동훈
- 한국IT서비스학회지
- /
- 제18권1호
- /
- pp.1-12
- /
- 2019
The purpose of this study is to find out what type of skip button used in forced exposure advertising is the most positive to the users. The four types of skip buttons were produced for the experiment and tested by survey and eye tracker to reveal the effects of the skip buttons on perceived intrusion, advertising attention, attitude toward advertising, and memory consisting of recall and recognition. Out of 80 participants, 20 were randomly assigned to the specific type of skip button group. The results showed that there is no statistical difference in advertising attention, perceived intrusiveness and attitude toward advertising. However, the recall and recognition rate are the highest in the static text type and kinetic text, product image, and default follow statistically. This study has implications for using skip buttons as a major variable for inventory of TrueView advertising effects and suggests that the amount of information in the image is critical processed by users within very short time.
https://doi.org/10.9716/KITS.2019.18.1.001 인용 PDF KSCI HTML

MALICIOUS URL RECOGNITION AND DETECTION USING ATTENTION-BASED CNN-LSTM

Peng, Yongfang;Tian, Shengwei;Yu, Long;Lv, Yalong;Wang, Ruijin
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- 제13권11호
- /
- pp.5580-5593
- /
- 2019
A malicious Uniform Resource Locator (URL) recognition and detection method based on the combination of Attention mechanism with Convolutional Neural Network and Long Short-Term Memory Network (Attention-Based CNN-LSTM), is proposed. Firstly, the WHOIS check method is used to extract and filter features, including the URL texture information, the URL string statistical information of attributes and the WHOIS information, and the features are subsequently encoded and pre-processed followed by inputting them to the constructed Convolutional Neural Network (CNN) convolution layer to extract local features. Secondly, in accordance with the weights from the Attention mechanism, the generated local features are input into the Long-Short Term Memory (LSTM) model, and subsequently pooled to calculate the global features of the URLs. Finally, the URLs are detected and classified by the SoftMax function using global features. The results demonstrate that compared with the existing methods, the Attention-based CNN-LSTM mechanism has higher accuracy for malicious URL detection.
https://doi.org/10.3837/tiis.2019.11.017 인용 PDF KSCI HTML

Sketch Recognition Using LSTM with Attention Mechanism and Minimum Cost Flow Algorithm

Nguyen-Xuan, Bac;Lee, Guee-Sang
- International Journal of Contents
- /
- 제15권4호
- /
- pp.8-15
- /
- 2019
This paper presents a solution of the 'Quick, Draw! Doodle Recognition Challenge' hosted by Google. Doodles are drawings comprised of concrete representational meaning or abstract lines creatively expressed by individuals. In this challenge, a doodle is presented as a sequence of sketches. From the view of at the sketch level, to learn the pattern of strokes representing a doodle, we propose a sequential model stacked with multiple convolution layers and Long Short-Term Memory (LSTM) cells following the attention mechanism [15]. From the view at the image level, we use multiple models pre-trained on ImageNet to recognize the doodle. Finally, an ensemble and a post-processing method using the minimum cost flow algorithm are introduced to combine multiple models in achieving better results. In this challenge, our solutions garnered 11th place among 1,316 teams. Our performance was 0.95037 MAP@3, only 0.4% lower than the winner. It demonstrates that our method is very competitive. The source code for this competition is published at: https://github.com/ngxbac/Kaggle-QuickDraw.
https://doi.org/10.5392/IJoC.2019.15.4.008 인용 PDF KSCI HTML

Automatic proficiency assessment of Korean speech read aloud by non-natives using bidirectional LSTM-based speech recognition

Oh, Yoo Rhee;Park, Kiyoung;Jeon, Hyung-Bae;Park, Jeon Gue
- ETRI Journal
- /
- 제42권5호
- /
- pp.761-772
- /
- 2020
This paper presents an automatic proficiency assessment method for a non-native Korean read utterance using bidirectional long short-term memory (BLSTM)-based acoustic models (AMs) and speech data augmentation techniques. Specifically, the proposed method considers two scenarios, with and without prompted text. The proposed method with the prompted text performs (a) a speech feature extraction step, (b) a forced-alignment step using a native AM and non-native AM, and (c) a linear regression-based proficiency scoring step for the five proficiency scores. Meanwhile, the proposed method without the prompted text additionally performs Korean speech recognition and a subword un-segmentation for the missing text. The experimental results indicate that the proposed method with prompted text improves the performance for all scores when compared to a method employing conventional AMs. In addition, the proposed method without the prompted text has a fluency score performance comparable to that of the method with prompted text.
https://doi.org/10.4218/etrij.2019-0400 인용 PDF KSCI

검색결과 473건 처리시간 0.026초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)