통합 검색 | Korea Science

통합 CNN, LSTM, 및 BERT 모델 기반의 음성 및 텍스트 다중 모달 감정 인식 연구 (Enhancing Multimodal Emotion Recognition in Speech and Text with Integrated CNN, LSTM, and BERT Models)

에드워드 카야디;한스 나타니엘 하디 수실로;송미화
- 문화기술의 융합
- /
- 제10권1호
- /
- pp.617-623
- /
- 2024
언어와 감정 사이의 복잡한 관계의 특징을 보이며, 우리의 말을 통해 감정을 식별하는 것은 중요한 과제로 인식된다. 이 연구는 음성 및 텍스트 데이터를 모두 포함하는 다중 모드 분류 작업을 통해 음성 언어의 감정을 식별하기 위해 속성 엔지니어링을 사용하여 이러한 과제를 해결하는 것을 목표로 한다. CNN(Convolutional Neural Networks)과 LSTM(Long Short-Term Memory)이라는 두 가지 분류기를 BERT 기반 사전 훈련된 모델과 통합하여 평가하였다. 논문에서 평가는 다양한 실험 설정 전반에 걸쳐 다양한 성능 지표(정확도, F-점수, 정밀도 및 재현율)를 다룬다. 이번 연구 결과는 텍스트와 음성 데이터 모두에서 감정을 정확하게 식별하는 두 모델의 뛰어난 능력을 보인다.
https://doi.org/10.17703/JCCT.2024.10.1.617 인용 PDF

음향학적 및 언어적 탐색을 이용한 어휘 인식 최적화 (The Vocabulary Recognition Optimize using Acoustic and Lexical Search)

안찬식;오상엽
- 한국멀티미디어학회논문지
- /
- 제13권4호
- /
- pp.496-503
- /
- 2010
어휘인식 시스템은 스탠드 얼론(Standalone)으로 개발되어 지고 있으며 휴대용 단말기에서 사용하였을 경우 메모리 공간의 제약과 오디오 압축으로 인해 인식률이 낮게 나타난다. 본 연구에서는 휴대용 단말기의 성능과 인식률 향상을 위하여 음향학적 탐색과 언어적 탐색을 분리하여 어휘 인식 속도를 개선한 시스템을 제안하였다. 음향학적 탐색은 휴대용 단말기에서 수행하고 보다 복잡한 언어적 탐색은 서버에서 처리하는 시스템으로 음성신호로부터 특징벡터를 추출하여 GMM을 이용한 음소인식을 수행하고, 인식된 음소 열을 서버로 전송하여 렉시컬 트리 탐색 알고리즘을 사용하여 언어적 탐색 단계에서 어휘 인식을 수행하였다. 시스템 성능 평가 결과 어휘 종속 인식률은 98.01%, 어휘 독립 인식률은 97.71%의 인식률을 나타냈으며 인식속도는 1.58초로 나타내었다.
PDF KSCI

풍경 그림에서 전형적인 정보의 삭제 방법이 오기억에 미치는 영향 (Effects of the Manner of Deleting Typical Items in a Scene on False Memory)

도경수;배경수
- 인지과학
- /
- 제18권2호
- /
- pp.113-138
- /
- 2007
풍경 그림을 사용하여 도식이 기억에 미치는 영향을 알아보았다. 실험 1에서는 즉시 검사를 실시하여 부호화할 때 도식이 기억에 미치는 영향을 알아보았고, 실험 2에서는 3일 지연 검사를 실시하여 파지기간 동안 도식이 기억에 미치는 영향을 알아보았다. 두 실험 모두에서 제시시간(250ms vs 1000ms)을 다르게 하고 목표 미끼를 생략하는 조건과 목표 미끼를 다른 물건으로 대체하는 조건으로 항목의 축어적 정보 부호화를 조작하였다. 특히 생략과 대체 조건을 비교하여 도식에 기반한 기대라는 하향정보 특정항목에 대한 상향정보가 기억에 미치는 효과를 알아보았다. 두 개의 실험에서 전형적인 항목에 대한 오기억은 지연검사에서도 별 변화가 없는데 반해 학습 항목에 대한 기억은 감소하였으며, 오기억은 생략조건에서 더 많이 보고되었으나, 비전형 항목에 대한 기억은 감소하였으며, 오기억은 생략조건에서 더 많이 보고되었으나, 비전형 항목에 대한 기억은 대체조건에서 더 많이 보고되었다. 이 결과는 도식에 기반한 오기억은 비교적 지속적이며, 도식에 의한 기대라는 하향 정보가 상향 정보와 충돌이 일어날 때에는 오기억이 감소한다는 사실을 보여 주었다.
PDF

Combining Empirical Feature Map and Conjugate Least Squares Support Vector Machine for Real Time Image Recognition : Research with Jade Solution Company

Kim, Byung Joo
- International Journal of Internet, Broadcasting and Communication
- /
- 제9권1호
- /
- pp.9-17
- /
- 2017
This paper describes a process of developing commercial real time image recognition system with company. In this paper we will make a system that is combining an empirical kernel map method and conjugate least squares support vector machine in order to represent images in a low-dimensional subspace for real time image recognition. In the traditional approach calculating these eigenspace models, known as traditional PCA method, model must capture all the images needed to build the internal representation. Updating of the existing eigenspace is only possible when all the images must be kept in order to update the eigenspace, requiring a lot of storage capability. Proposed method allows discarding the acquired images immediately after the update. By experimental results we can show that empirical kernel map has similar accuracy compare to traditional batch way eigenspace method and more efficient in memory requirement than traditional one. This experimental result shows that proposed model is suitable for commercial real time image recognition system.
https://doi.org/10.7236/IJIBC.2017.9.1.9 인용 PDF

가중특징 Mahalanobis거리를 이용한 마이크 어레이 음석인식의 성능향상 (Performance Improvement of Microphone Array Speech Recognition Using Features Weighted Mahalanobis Distance)

;정현열
- The Journal of the Acoustical Society of Korea
- /
- 제29권1E호
- /
- pp.45-53
- /
- 2010
In this paper, we present the use of the Features Weighted Mahalanobis Distance (FWMD) in improving the performance of Likelihood Maximizing Beamforming (Limabeam) algorithm in speech recognition for microphone array. The proposed approach is based on the replacement of the traditional distance measure in a Gaussian classifier with adding weight for different features in the Mahalanobis distance according to their distances after the variance normalization. By using Features Weighted Mahalanobis Distance for Limabeam algorithm (FWMD-Limabeam), we obtained correct word recognition rate of 90.26% for calibrate Limabeam and 87.23% for unsupervised Limabeam, resulting in a higher rate of 3% and 6% respectively than those produced by the original Limabearn. By implementing a HM-Net speech recognition strategy alternatively, we could save memory and reduce computation complexity.
PDF KSCI

회귀신경예측 모델을 이용한 음성인식 (Speech Recognition Using Recurrent Neural Prediction Models)

류제관;나경민;임재열;성경모;안성길
- 전자공학회논문지B
- /
- 제32B권11호
- /
- pp.1489-1495
- /
- 1995
In this paper, we propose recurrent neural prediction models (RNPM), recurrent neural networks trained as a nonlinear predictor of speech, as a new connectionist model for speech recognition. RNPM modulates its mapping effectively by internal representation, and it requires no time alignment algorithm. Therefore, computational load at the recognition stage is reduced substantially compared with the well known predictive neural networks (PNN), and the size of the required memory is much smaller. And, RNPM does not suffer from the problem of deciding the time varying target function. In the speaker dependent and independent speech recognition experiments under the various conditions, the proposed model was comparable in recognition performance to the PNN, while retaining the above merits that PNN doesn't have.
PDF

시계열 스트리트뷰 데이터베이스를 이용한 시각적 위치 인식 알고리즘 (Visual Location Recognition Using Time-Series Streetview Database)

박천수;최준연
- 반도체디스플레이기술학회지
- /
- 제18권4호
- /
- pp.57-61
- /
- 2019
Nowadays, portable digital cameras such as smart phone cameras are being popularly used for entertainment and visual information recording. Given a database of geo-tagged images, a visual location recognition system can determine the place depicted in a query photo. One of the most common visual location recognition approaches is the bag-of-words method where local image features are clustered into visual words. In this paper, we propose a new bag-of-words-based visual location recognition algorithm using time-series streetview database. The proposed algorithm selects only a small subset of image features which will be used in image retrieval process. By reducing the number of features to be used, the proposed algorithm can reduce the memory requirement of the image database and accelerate the retrieval process.
PDF KSCI

Tobacco Retail License Recognition Based on Dual Attention Mechanism

Shan, Yuxiang;Ren, Qin;Wang, Cheng;Wang, Xiuhui
- Journal of Information Processing Systems
- /
- 제18권4호
- /
- pp.480-488
- /
- 2022
Images of tobacco retail licenses have complex unstructured characteristics, which is an urgent technical problem in the robot process automation of tobacco marketing. In this paper, a novel recognition approach using a double attention mechanism is presented to realize the automatic recognition and information extraction from such images. First, we utilized a DenseNet network to extract the license information from the input tobacco retail license data. Second, bi-directional long short-term memory was used for coding and decoding using a continuous decoder integrating dual attention to realize the recognition and information extraction of tobacco retail license images without segmentation. Finally, several performance experiments were conducted using a largescale dataset of tobacco retail licenses. The experimental results show that the proposed approach achieves a correction accuracy of 98.36% on the ZY-LQ dataset, outperforming most existing methods.
https://doi.org/10.3745/JIPS.02.0177 인용 PDF KSCI

1D-CNN-LSTM Hybrid-Model-Based Pet Behavior Recognition through Wearable Sensor Data Augmentation

Hyungju Kim;Nammee Moon
- Journal of Information Processing Systems
- /
- 제20권2호
- /
- pp.159-172
- /
- 2024
The number of healthcare products available for pets has increased in recent times, which has prompted active research into wearable devices for pets. However, the data collected through such devices are limited by outliers and missing values owing to the anomalous and irregular characteristics of pets. Hence, we propose pet behavior recognition based on a hybrid one-dimensional convolutional neural network (CNN) and long short- term memory (LSTM) model using pet wearable devices. An Arduino-based pet wearable device was first fabricated to collect data for behavior recognition, where gyroscope and accelerometer values were collected using the device. Then, data augmentation was performed after replacing any missing values and outliers via preprocessing. At this time, the behaviors were classified into five types. To prevent bias from specific actions in the data augmentation, the number of datasets was compared and balanced, and CNN-LSTM-based deep learning was performed. The five subdivided behaviors and overall performance were then evaluated, and the overall accuracy of behavior recognition was found to be about 88.76%.
https://doi.org/10.3745/JIPS.02.0211 인용 PDF

분산 메모리 다중 프로세서 상에서의 병렬 음성인식 (Parallel Speech Recognition on Distributed Memory Multiprocessors)

윤지현;홍성태;정상화;김형순
- 한국정보과학회:학술대회논문집
- /
- 한국정보과학회 1998년도 가을 학술발표논문집 Vol.25 No.2 (3)
- /
- pp.747-749
- /
- 1998
본 논문에서는 음성과 자연언어의 통합처리를 위한 효과적인 병렬 계산 모델을 제안한다. 음소모델은 continuous HMM에 기반을 둔 문맥종속형 음소를 사용하며, 언어모델은 knowledge-based approach를 사용한다. 또한 계층구조의 지식베이스상에서 다수의 가설을 처리하기 위해 memory-based parsing기술을 사용하였다. 본 연구의 병렬 음성인식 알고리즘은 분산메모리 MIMD 구조의 다중 Transputer 시스템을 이용하여 구현되었다. 실험을 통하여 음성인식 과정에서 발생하는 speech-specific problem의 해를 제공하고 음성인식 시스템의 병렬화를 통하여 실시간 음성인식의 가능성을 보여준다.
PDF

검색결과 473건 처리시간 0.025초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)