Search | Korea Science

Synthesis of Expressive Talking Heads from Speech with Recurrent Neural Network (RNN을 이용한 Expressive Talking Head from Speech의 합성)

Sakurai, Ryuhei;Shimba, Taiki;Yamazoe, Hirotake;Lee, Joo-Ho
- The Journal of Korea Robotics Society
- /
- v.13 no.1
- /
- pp.16-25
- /
- 2018
The talking head (TH) indicates an utterance face animation generated based on text and voice input. In this paper, we propose the generation method of TH with facial expression and intonation by speech input only. The problem of generating TH from speech can be regarded as a regression problem from the acoustic feature sequence to the facial code sequence which is a low dimensional vector representation that can efficiently encode and decode a face image. This regression was modeled by bidirectional RNN and trained by using SAVEE database of the front utterance face animation database as training data. The proposed method is able to generate TH with facial expression and intonation TH by using acoustic features such as MFCC, dynamic elements of MFCC, energy, and F0. According to the experiments, the configuration of the BLSTM layer of the first and second layers of bidirectional RNN was able to predict the face code best. For the evaluation, a questionnaire survey was conducted for 62 persons who watched TH animations, generated by the proposed method and the previous method. As a result, 77% of the respondents answered that the proposed method generated TH, which matches well with the speech.
https://doi.org/10.7746/jkros.2018.13.1.016 인용 PDF KSCI

Ship Motion-Based Prediction of Damage Locations Using Bidirectional Long Short-Term Memory

Son, Hye-young;Kim, Gi-yong;Kang, Hee-jin;Choi, Jin;Lee, Dong-kon;Shin, Sung-chul
- Journal of Ocean Engineering and Technology
- /
- v.36 no.5
- /
- pp.295-302
- /
- 2022
The initial response to a marine accident can play a key role to minimize the accident. Therefore, various decision support systems have been developed using sensors, simulations, and active response equipment. In this study, we developed an algorithm to predict damage locations using ship motion data with bidirectional long short-term memory (BiLSTM), a type of recurrent neural network. To reflect the low frequency ship motion characteristics, 200 time-series data collected for 100 s were considered as input values. Heave, roll, and pitch were used as features for the prediction model. The F1-score of the BiLSTM model was 0.92; this was an improvement over the F1-score of 0.90 of a prior model. Furthermore, 53 of 75 locations of damage had an F1-score above 0.90. The model predicted the damage location with high accuracy, allowing for a quick initial response even if the ship did not have flood sensors. The model can be used as input data with high accuracy for a real-time progressive flooding simulator on board.
https://doi.org/10.26748/KSOE.2022.026 인용 PDF KSCI

Extraction and classification of tempo stimuli from electroencephalography recordings using convolutional recurrent attention model

Lee, Gi Yong;Kim, Min-Soo;Kim, Hyoung-Gook
- ETRI Journal
- /
- v.43 no.6
- /
- pp.1081-1092
- /
- 2021
Electroencephalography (EEG) recordings taken during the perception of music tempo contain information that estimates the tempo of a music piece. If information about this tempo stimulus in EEG recordings can be extracted and classified, it can be effectively used to construct a music-based brain-computer interface. This study proposes a novel convolutional recurrent attention model (CRAM) to extract and classify features corresponding to tempo stimuli from EEG recordings of listeners who listened with concentration to the tempo of musics. The proposed CRAM is composed of six modules, namely, network inputs, two-dimensional convolutional bidirectional gated recurrent unit-based sample encoder, sample-level intuitive attention, segment encoder, segment-level intuitive attention, and softmax layer, to effectively model spatiotemporal features and improve the classification accuracy of tempo stimuli. To evaluate the proposed method's performance, we conducted experiments on two benchmark datasets. The proposed method achieves promising results, outperforming recent methods.
https://doi.org/10.4218/etrij.2021-0174 인용 PDF KSCI

Sound event detection based on multi-channel multi-scale neural networks for home monitoring system used by the hard-of-hearing (청각 장애인용 홈 모니터링 시스템을 위한 다채널 다중 스케일 신경망 기반의 사운드 이벤트 검출)

Lee, Gi Yong;Kim, Hyoung-Gook
- The Journal of the Acoustical Society of Korea
- /
- v.39 no.6
- /
- pp.600-605
- /
- 2020
In this paper, we propose a sound event detection method using a multi-channel multi-scale neural networks for sound sensing home monitoring for the hearing impaired. In the proposed system, two channels with high signal quality are selected from several wireless microphone sensors in home. The three features (time difference of arrival, pitch range, and outputs obtained by applying multi-scale convolutional neural network to log mel spectrogram) extracted from the sensor signals are applied to a classifier based on a bidirectional gated recurrent neural network to further improve the performance of sound event detection. The detected sound event result is converted into text along with the sensor position of the selected channel and provided to the hearing impaired. The experimental results show that the sound event detection method of the proposed system is superior to the existing method and can effectively deliver sound information to the hearing impaired.
https://doi.org/10.7776/ASK.2020.39.6.600 인용 PDF KSCI

A study on training DenseNet-Recurrent Neural Network for sound event detection (음향 이벤트 검출을 위한 DenseNet-Recurrent Neural Network 학습 방법에 관한 연구)

Hyeonjin Cha;Sangwook Park
- The Journal of the Acoustical Society of Korea
- /
- v.42 no.5
- /
- pp.395-401
- /
- 2023
Sound Event Detection (SED) aims to identify not only sound category but also time interval for target sounds in an audio waveform. It is a critical technique in field of acoustic surveillance system and monitoring system. Recently, various models have introduced through Detection and Classification of Acoustic Scenes and Events (DCASE) Task 4. This paper explored how to design optimal parameters of DenseNet based model, which has led to outstanding performance in other recognition system. In experiment, DenseRNN as an SED model consists of DensNet-BC and bi-directional Gated Recurrent Units (GRU). This model is trained with Mean teacher model. With an event-based f-score, evaluation is performed depending on parameters, related to model architecture as well as model training, under the assessment protocol of DCASE task4. Experimental result shows that the performance goes up and has been saturated to near the best. Also, DenseRNN would be trained more effectively without dropout technique.
https://doi.org/10.7776/ASK.2023.42.5.395 인용 PDF

Cross-Domain Text Sentiment Classification Method Based on the CNN-BiLSTM-TE Model

Zeng, Yuyang;Zhang, Ruirui;Yang, Liang;Song, Sujuan
- Journal of Information Processing Systems
- /
- v.17 no.4
- /
- pp.818-833
- /
- 2021
To address the problems of low precision rate, insufficient feature extraction, and poor contextual ability in existing text sentiment analysis methods, a mixed model account of a CNN-BiLSTM-TE (convolutional neural network, bidirectional long short-term memory, and topic extraction) model was proposed. First, Chinese text data was converted into vectors through the method of transfer learning by Word2Vec. Second, local features were extracted by the CNN model. Then, contextual information was extracted by the BiLSTM neural network and the emotional tendency was obtained using softmax. Finally, topics were extracted by the term frequency-inverse document frequency and K-means. Compared with the CNN, BiLSTM, and gate recurrent unit (GRU) models, the CNN-BiLSTM-TE model's F1-score was higher than other models by 0.0147, 0.006, and 0.0052, respectively. Then compared with CNN-LSTM, LSTM-CNN, and BiLSTM-CNN models, the F1-score was higher by 0.0071, 0.0038, and 0.0049, respectively. Experimental results showed that the CNN-BiLSTM-TE model can effectively improve various indicators in application. Lastly, performed scalability verification through a takeaway dataset, which has great value in practical applications.
https://doi.org/10.3745/JIPS.04.0221 인용 PDF KSCI

Network Intrusion Detection Using Transformer and BiGRU-DNN in Edge Computing

Huijuan Sun
- Journal of Information Processing Systems
- /
- v.20 no.4
- /
- pp.458-476
- /
- 2024
To address the issue of class imbalance in network traffic data, which affects the network intrusion detection performance, a combined framework using transformers is proposed. First, Tomek Links, SMOTE, and WGAN are used to preprocess the data to solve the class-imbalance problem. Second, the transformer is used to encode traffic data to extract the correlation between network traffic. Finally, a hybrid deep learning network model combining a bidirectional gated current unit and deep neural network is proposed, which is used to extract long-dependence features. A DNN is used to extract deep level features, and softmax is used to complete classification. Experiments were conducted on the NSLKDD, UNSWNB15, and CICIDS2017 datasets, and the detection accuracy rates of the proposed model were 99.72%, 84.86%, and 99.89% on three datasets, respectively. Compared with other relatively new deep-learning network models, it effectively improved the intrusion detection performance, thereby improving the communication security of network data.
https://doi.org/10.3745/JIPS.01.0106 인용 PDF

A patent application filing forecasting method based on the bidirectional LSTM (양방향 LSTM기반 시계열 특허 동향 예측 연구)

Seungwan, Choi;Kwangsoo, Kim;Sooyeong, Kwak
- Journal of IKEEE
- /
- v.26 no.4
- /
- pp.545-552
- /
- 2022
The number of patent application filing for a specific technology has a good relation with the technology's life cycle and future industry development on that area. So industry and governments are highly interested in forecasting the number of patent application filing in order to take appropriate preparations in advance. In this paper, a new method based on the bidirectional long short-term memory(LSTM), a kind of recurrent neural network(RNN), is proposed to improve the forecasting accuracy compared to related methods. Compared with the Bass model which is one of conventional diffusion modeling methods, the proposed method shows the 16% higher performance with the Korean patent filing data on the five selected technology areas.
https://doi.org/10.7471/ikeee.2022.26.4.545 인용 PDF KSCI

A Text Content Classification Using LSTM For Objective Category Classification

Noh, Young-Dan;Cho, Kyu-Cheol
- Journal of the Korea Society of Computer and Information
- /
- v.26 no.5
- /
- pp.39-46
- /
- 2021
AI is deeply applied to various algorithms that assists us, not only daily technologies like translator and Face ID, but also contributing to innumerable fields in industry, due to its dominance. In this research, we provide convenience through AI categorization, extracting the only data that users need, with objective classification, rather than verifying all data to find from the internet, where exists an immense number of contents. In this research, we propose a model using LSTM(Long-Short Term Memory Network), which stands out from text classification, and compare its performance with models of RNN(Recurrent Neural Network) and BiLSTM(Bidirectional LSTM), which is suitable structure for natural language processing. The performance of the three models is compared using measurements of accuracy, precision, and recall. As a result, the LSTM model appears to have the best performance. Therefore, in this research, text classification using LSTM is recommended.
https://doi.org/10.9708/jksci.2021.26.05.039 인용 PDF KSCI HTML

A Fuzzy-AHP-based Movie Recommendation System with the Bidirectional Recurrent Neural Network Language Model (양방향 순환 신경망 언어 모델을 이용한 Fuzzy-AHP 기반 영화 추천 시스템)

Oh, Jae-Taek;Lee, Sang-Yong
- Journal of Digital Convergence
- /
- v.18 no.12
- /
- pp.525-531
- /
- 2020
In today's IT environment where various pieces of information are distributed in large volumes, recommendation systems are in the spotlight capable of figuring out users' needs fast and helping them with their decisions. The current recommendation systems, however, have a couple of problems including that user preference may not be reflected on the systems right away according to their changing tastes or interests and that items with no relations to users' preference may be recommended, being induced by advertising. In an effort to solve these problems, this study set out to propose a Fuzzy-AHP-based movie recommendation system by applying the BRNN(Bidirectional Recurrent Neural Network) language model. Applied to this system was Fuzzy-AHP to reflect users' tastes or interests in clear and objective ways. In addition, the BRNN language model was adopted to analyze movie-related data collected in real time and predict movies preferred by users. The system was assessed for its performance with grid searches to examine the fitness of the learning model for the entire size of word sets. The results show that the learning model of the system recorded a mean cross-validation index of 97.9% according to the entire size of word sets, thus proving its fitness. The model recorded a RMSE of 0.66 and 0.805 against the movie ratings on Naver and LSTM model language model, respectively, demonstrating the system's superior performance in predicting movie ratings.
https://doi.org/10.14400/JDC.2020.18.12.525 인용 PDF KSCI

Search Result 38, Processing Time 0.04 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)