Search | Korea Science

Enhancing Multimodal Emotion Recognition in Speech and Text with Integrated CNN, LSTM, and BERT Models (통합 CNN, LSTM, 및 BERT 모델 기반의 음성 및 텍스트 다중 모달 감정 인식 연구)

Edward Dwijayanto Cahyadi;Hans Nathaniel Hadi Soesilo;Mi-Hwa Song
- The Journal of the Convergence on Culture Technology
- /
- v.10 no.1
- /
- pp.617-623
- /
- 2024
Identifying emotions through speech poses a significant challenge due to the complex relationship between language and emotions. Our paper aims to take on this challenge by employing feature engineering to identify emotions in speech through a multimodal classification task involving both speech and text data. We evaluated two classifiers-Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM)-both integrated with a BERT-based pre-trained model. Our assessment covers various performance metrics (accuracy, F-score, precision, and recall) across different experimental setups). The findings highlight the impressive proficiency of two models in accurately discerning emotions from both text and speech data.
https://doi.org/10.17703/JCCT.2024.10.1.617 인용 PDF

The Vocabulary Recognition Optimize using Acoustic and Lexical Search (음향학적 및 언어적 탐색을 이용한 어휘 인식 최적화)

Ahn, Chan-Shik;Oh, Sang-Yeob
- Journal of Korea Multimedia Society
- /
- v.13 no.4
- /
- pp.496-503
- /
- 2010
Speech recognition system is developed of standalone, In case of a mobile terminal using that low recognition rate represent because of limitation of memory size and audio compression. This study suggest vocabulary recognition highest performance improvement system for separate acoustic search and lexical search. Acoustic search is carry out in mobile terminal, lexical search is carry out in server processing system. feature vector of speech signal extract using GMM a phoneme execution, recognition a phoneme list transmission server using Lexical Tree Search algorithm lexical search recognition execution. System performance as a result of represent vocabulary dependence recognition rate of 98.01%, vocabulary independence recognition rate of 97.71%, represent recognition speed of 1.58 second.
PDF KSCI

Effects of the Manner of Deleting Typical Items in a Scene on False Memory (풍경 그림에서 전형적인 정보의 삭제 방법이 오기억에 미치는 영향)

Do, Kyung-Soo;Bae, Kyung-Sue
- Korean Journal of Cognitive Science
- /
- v.18 no.2
- /
- pp.113-138
- /
- 2007
The effects of schema on accurate and false memories of items in a scene were investigated in two experiments: Recognition of items in a scene was tested immediately in Experiment 1 and three days later in Experiment 2. In both experiments, the following three variables were manipulated: Exposure time (250ms or 10000ms), picture mode (completed pictures or scrambled pictures), and manipulation mode (missing item or substituted item). Experiment 1 had yielded three important results: First, although accurate memory for presented items got increased when the exposure time was longer, false memory of the critical lures was not changed. Second, false memory of critical lures in the missing condition, where there was not any conflict between verbatim information and gist information, was higher than that of the substituted condition, where verbatim information of the item that replaced the lure was in conflict with the gist information. Third, accurate memory for atypical items in the substituted rendition, which had replaced the critical lures and in conflict with the schema, was higher than that in the missing condition. In Experiment 2, recognition test were administered 72 hours after the participants saw the picture. The three effects mentioned in Experiment 1 had disappeared in Experiment 2. The results of Experiment 2 might be due to the selective weakening of verbatim information compared to the persistence of the gist (or schematic) information. The results of Experiments 1 and 2 showed that false memory of critical lures is more persistent than the accurate memory of non-critical information. Theoretical implications of the results were considered in terms of the function of the verbatim and gist information.
PDF

Combining Empirical Feature Map and Conjugate Least Squares Support Vector Machine for Real Time Image Recognition : Research with Jade Solution Company

Kim, Byung Joo
- International Journal of Internet, Broadcasting and Communication
- /
- v.9 no.1
- /
- pp.9-17
- /
- 2017
This paper describes a process of developing commercial real time image recognition system with company. In this paper we will make a system that is combining an empirical kernel map method and conjugate least squares support vector machine in order to represent images in a low-dimensional subspace for real time image recognition. In the traditional approach calculating these eigenspace models, known as traditional PCA method, model must capture all the images needed to build the internal representation. Updating of the existing eigenspace is only possible when all the images must be kept in order to update the eigenspace, requiring a lot of storage capability. Proposed method allows discarding the acquired images immediately after the update. By experimental results we can show that empirical kernel map has similar accuracy compare to traditional batch way eigenspace method and more efficient in memory requirement than traditional one. This experimental result shows that proposed model is suitable for commercial real time image recognition system.
https://doi.org/10.7236/IJIBC.2017.9.1.9 인용 PDF

Performance Improvement of Microphone Array Speech Recognition Using Features Weighted Mahalanobis Distance (가중특징 Mahalanobis거리를 이용한 마이크 어레이 음석인식의 성능향상)

Nguyen, Dinh Cuong;Chung, Hyun-Yeol
- The Journal of the Acoustical Society of Korea
- /
- v.29 no.1E
- /
- pp.45-53
- /
- 2010
In this paper, we present the use of the Features Weighted Mahalanobis Distance (FWMD) in improving the performance of Likelihood Maximizing Beamforming (Limabeam) algorithm in speech recognition for microphone array. The proposed approach is based on the replacement of the traditional distance measure in a Gaussian classifier with adding weight for different features in the Mahalanobis distance according to their distances after the variance normalization. By using Features Weighted Mahalanobis Distance for Limabeam algorithm (FWMD-Limabeam), we obtained correct word recognition rate of 90.26% for calibrate Limabeam and 87.23% for unsupervised Limabeam, resulting in a higher rate of 3% and 6% respectively than those produced by the original Limabearn. By implementing a HM-Net speech recognition strategy alternatively, we could save memory and reduce computation complexity.
PDF KSCI

Speech Recognition Using Recurrent Neural Prediction Models (회귀신경예측 모델을 이용한 음성인식)

류제관;나경민;임재열;성경모;안성길
- Journal of the Korean Institute of Telematics and Electronics B
- /
- v.32B no.11
- /
- pp.1489-1495
- /
- 1995
In this paper, we propose recurrent neural prediction models (RNPM), recurrent neural networks trained as a nonlinear predictor of speech, as a new connectionist model for speech recognition. RNPM modulates its mapping effectively by internal representation, and it requires no time alignment algorithm. Therefore, computational load at the recognition stage is reduced substantially compared with the well known predictive neural networks (PNN), and the size of the required memory is much smaller. And, RNPM does not suffer from the problem of deciding the time varying target function. In the speaker dependent and independent speech recognition experiments under the various conditions, the proposed model was comparable in recognition performance to the PNN, while retaining the above merits that PNN doesn't have.
PDF

Visual Location Recognition Using Time-Series Streetview Database (시계열 스트리트뷰 데이터베이스를 이용한 시각적 위치 인식 알고리즘)

Park, Chun-Su;Choeh, Joon-Yeon
- Journal of the Semiconductor & Display Technology
- /
- v.18 no.4
- /
- pp.57-61
- /
- 2019
Nowadays, portable digital cameras such as smart phone cameras are being popularly used for entertainment and visual information recording. Given a database of geo-tagged images, a visual location recognition system can determine the place depicted in a query photo. One of the most common visual location recognition approaches is the bag-of-words method where local image features are clustered into visual words. In this paper, we propose a new bag-of-words-based visual location recognition algorithm using time-series streetview database. The proposed algorithm selects only a small subset of image features which will be used in image retrieval process. By reducing the number of features to be used, the proposed algorithm can reduce the memory requirement of the image database and accelerate the retrieval process.
PDF KSCI

Tobacco Retail License Recognition Based on Dual Attention Mechanism

Shan, Yuxiang;Ren, Qin;Wang, Cheng;Wang, Xiuhui
- Journal of Information Processing Systems
- /
- v.18 no.4
- /
- pp.480-488
- /
- 2022
Images of tobacco retail licenses have complex unstructured characteristics, which is an urgent technical problem in the robot process automation of tobacco marketing. In this paper, a novel recognition approach using a double attention mechanism is presented to realize the automatic recognition and information extraction from such images. First, we utilized a DenseNet network to extract the license information from the input tobacco retail license data. Second, bi-directional long short-term memory was used for coding and decoding using a continuous decoder integrating dual attention to realize the recognition and information extraction of tobacco retail license images without segmentation. Finally, several performance experiments were conducted using a largescale dataset of tobacco retail licenses. The experimental results show that the proposed approach achieves a correction accuracy of 98.36% on the ZY-LQ dataset, outperforming most existing methods.
https://doi.org/10.3745/JIPS.02.0177 인용 PDF KSCI

1D-CNN-LSTM Hybrid-Model-Based Pet Behavior Recognition through Wearable Sensor Data Augmentation

Hyungju Kim;Nammee Moon
- Journal of Information Processing Systems
- /
- v.20 no.2
- /
- pp.159-172
- /
- 2024
The number of healthcare products available for pets has increased in recent times, which has prompted active research into wearable devices for pets. However, the data collected through such devices are limited by outliers and missing values owing to the anomalous and irregular characteristics of pets. Hence, we propose pet behavior recognition based on a hybrid one-dimensional convolutional neural network (CNN) and long short- term memory (LSTM) model using pet wearable devices. An Arduino-based pet wearable device was first fabricated to collect data for behavior recognition, where gyroscope and accelerometer values were collected using the device. Then, data augmentation was performed after replacing any missing values and outliers via preprocessing. At this time, the behaviors were classified into five types. To prevent bias from specific actions in the data augmentation, the number of datasets was compared and balanced, and CNN-LSTM-based deep learning was performed. The five subdivided behaviors and overall performance were then evaluated, and the overall accuracy of behavior recognition was found to be about 88.76%.
https://doi.org/10.3745/JIPS.02.0211 인용 PDF

Parallel Speech Recognition on Distributed Memory Multiprocessors (분산 메모리 다중 프로세서 상에서의 병렬 음성인식)

윤지현;홍성태;정상화;김형순
- Proceedings of the Korean Information Science Society Conference
- /
- 1998.10a
- /
- pp.747-749
- /
- 1998
본 논문에서는 음성과 자연언어의 통합처리를 위한 효과적인 병렬 계산 모델을 제안한다. 음소모델은 continuous HMM에 기반을 둔 문맥종속형 음소를 사용하며, 언어모델은 knowledge-based approach를 사용한다. 또한 계층구조의 지식베이스상에서 다수의 가설을 처리하기 위해 memory-based parsing기술을 사용하였다. 본 연구의 병렬 음성인식 알고리즘은 분산메모리 MIMD 구조의 다중 Transputer 시스템을 이용하여 구현되었다. 실험을 통하여 음성인식 과정에서 발생하는 speech-specific problem의 해를 제공하고 음성인식 시스템의 병렬화를 통하여 실시간 음성인식의 가능성을 보여준다.
PDF

Search Result 473, Processing Time 0.022 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)