Search | Korea Science

The design of Multi-modal system for the realization of DARC system controller (DARC 시스템 제어기 구현을 위한 멀티모달 시스템 설계)

최광국;곽상훈;하얀돌이;김유진;김철;최승호
- Proceedings of the IEEK Conference
- /
- 2000.09a
- /
- pp.179-182
- /
- 2000
본 논문은 DARC 시스템 제어기를 구현하기 위해 음성인식기와 입술인식기를 결합하여 멀티모달 시스템을 설계하였다. DARC 시스템에서 사용하고 있는 22개 단어를 DB로 구축하고, HMM을 적용하여 인식기를 설계하였다. 두 모달간 인식 확률 결합방법은 음성인식기가 입술인식기에 비해 높은 인식률을 가지고 있다는 가정 하에 8:2 비율의 가중치로 결합하였고, 결합시점은 인식 후 확률을 결합하는 방법을 적용하였다. 시스템간 인터페이스에서는 인터넷 프로토콜인 TCP/IP의 소켓을 통신모듈로 설계/구현하고, 인식실험은 테스트 DB를 이용한 방법과 5명의 화자가 실시간 실험을 통해 그 성능 평가를 하였다.
PDF

Model Updating of Head Stack Assembly using Modal Tuning (모달 튜닝을 이용한 하드디스크 구동기의 모델 개선)

Lee, Jin-Koo;Kim, Dong-Woohn;Park, Young-Pil
- Proceedings of the Korean Society for Noise and Vibration Engineering Conference
- /
- 2000.11a
- /
- pp.243-248
- /
- 2000
하드디스크의 트랙 밀도를 높이기 위해서는 충분한 서보대역을 갖는 액츄에이터를 개발하는 것이 필수적이다 이 논문에서는 액츄에이터의 동특성 중에서 서보대역을 제한하는 주된 요인을 알아보기 위해 실험 모드 해석과 유한 요소 해석을 수행하였다. 우선 액츄에이터를 구성하고 있는 VCM 코일, E 블럭, 서스펜션등의 부분계에 대한 유한 요소 해석을 수행하였고 모달 실험을 통해 이를 검증하였다. 검증된 각 부분계의 모델을 결합하여 한 개의 서스펜션을 갖는 액츄에이터 시스템의 유한 요소 모델을 개발하였고 이를 통해 서보 성능과 관계된 모달 파라미터들을 규명하였다.
PDF

Learning and Transferring Deep Neural Network Models for Image Caption Generation (이미지 캡션 생성을 위한 심층 신경망 모델 학습과 전이)

Kim, Dong-Ha;Kim, Incheol
- Proceedings of the Korea Information Processing Society Conference
- /
- 2016.10a
- /
- pp.617-620
- /
- 2016
본 논문에서는 이미지 캡션 생성과 모델 전이에 효과적인 심층 신경망 모델을 제시한다. 본 모델은 멀티 모달 순환 신경망 모델의 하나로서, 이미지로부터 시각 정보를 추출하는 컨볼루션 신경망 층, 각 단어를 저차원의 특징으로 변환하는 임베딩 층, 캡션 문장 구조를 학습하는 순환 신경망 층, 시각 정보와 언어 정보를 결합하는 멀티 모달 층 등 총 5 개의 계층들로 구성된다. 특히 본 모델에서는 시퀀스 패턴 학습과 모델 전이에 우수한 LSTM 유닛을 이용하여 순환 신경망 층을 구성하고, 컨볼루션 신경망 층의 출력을 임베딩 층뿐만 아니라 멀티 모달 층에도 연결함으로써, 캡션 문장 생성을 위한 매 단계마다 이미지의 시각 정보를 이용할 수 있는 연결 구조를 가진다. Flickr8k, Flickr30k, MSCOCO 등의 공개 데이터 집합들을 이용한 다양한 비교 실험을 통해, 캡션의 정확도와 모델 전이의 효과 면에서 본 논문에서 제시한 멀티 모달 순환 신경망 모델의 우수성을 입증하였다.
https://doi.org/10.3745/PKIPS.y2016m10a.617 인용 PDF

W3C based Interoperable Multimodal Communicator (W3C 기반 상호연동 가능한 멀티모달 커뮤니케이터)

Park, Daemin;Gwon, Daehyeok;Choi, Jinhuyck;Lee, Injae;Choi, Haechul
- Journal of Broadcast Engineering
- /
- v.20 no.1
- /
- pp.140-152
- /
- 2015
HCI(Human Computer Interaction) enables the interaction between people and computers by using a human-familiar interface called as Modality. Recently, to provide an optimal interface according to various devices and service environment, an advanced HCI method using multiple modalities is intensively studied. However, the multimodal interface has difficulties that modalities have different data formats and are hard to be cooperated efficiently. To solve this problem, a multimodal communicator is introduced, which is based on EMMA(Extensible Multimodal Annotation Markup language) and MMI(Multimodal Interaction Framework) of W3C(World Wide Web Consortium) standards. This standard based framework consisting of modality component, interaction manager, and presentation component makes multiple modalities interoperable and provides a wide expansion capability for other modalities. Experimental results show that the multimodal communicator is facilitated by using multiple modalities of eye tracking and gesture recognition for a map browsing scenario.
https://doi.org/10.5909/JBE.2015.20.1.140 인용 PDF KSCI KPUBS HTML

Robust Endpoint Detection for Bimodal System in Noisy Environments (잡음환경에서의 바이모달 시스템을 위한 견실한 끝점검출)

오현화;권홍석;손종목;진성일;배건성
- Journal of the Institute of Electronics Engineers of Korea CI
- /
- v.40 no.5
- /
- pp.289-297
- /
- 2003
The performance of a bimodal system is affected by the accuracy of the endpoint detection from the input signal as well as the performance of the speech recognition or lipreading system. In this paper, we propose the endpoint detection method which detects the endpoints from the audio and video signal respectively and utilizes the signal to-noise ratio (SNR) estimated from the input audio signal to select the reliable endpoints to the acoustic noise. In other words, the endpoints are detected from the audio signal under the high SNR and from the video signal under the low SNR. Experimental results show that the bimodal system using the proposed endpoint detector achieves satisfactory recognition rates, especially when the acoustic environment is quite noisy.
PDF KSCI

Lip Detection using Color Distribution and Support Vector Machine for Visual Feature Extraction of Bimodal Speech Recognition System (바이모달 음성인식기의 시각 특징 추출을 위한 색상 분석자 SVM을 이용한 입술 위치 검출)

정지년;양현승
- Journal of KIISE:Software and Applications
- /
- v.31 no.4
- /
- pp.403-410
- /
- 2004
Bimodal speech recognition systems have been proposed for enhancing recognition rate of ASR under noisy environments. Visual feature extraction is very important to develop these systems. To extract visual features, it is necessary to detect exact lip position. This paper proposed the method that detects a lip position using color similarity model and SVM. Face/Lip color distribution is teamed and the initial lip position is found by using that. The exact lip position is detected by scanning neighbor area with SVM. By experiments, it is shown that this method detects lip position exactly and fast.
PDF KSCI

Character-based Subtitle Generation by Learning of Multimodal Concept Hierarchy from Cartoon Videos (멀티모달 개념계층모델을 이용한 만화비디오 컨텐츠 학습을 통한 등장인물 기반 비디오 자막 생성)

Kim, Kyung-Min;Ha, Jung-Woo;Lee, Beom-Jin;Zhang, Byoung-Tak
- Journal of KIISE
- /
- v.42 no.4
- /
- pp.451-458
- /
- 2015
Previous multimodal learning methods focus on problem-solving aspects, such as image and video search and tagging, rather than on knowledge acquisition via content modeling. In this paper, we propose the Multimodal Concept Hierarchy (MuCH), which is a content modeling method that uses a cartoon video dataset and a character-based subtitle generation method from the learned model. The MuCH model has a multimodal hypernetwork layer, in which the patterns of the words and image patches are represented, and a concept layer, in which each concept variable is represented by a probability distribution of the words and the image patches. The model can learn the characteristics of the characters as concepts from the video subtitles and scene images by using a Bayesian learning method and can also generate character-based subtitles from the learned model if text queries are provided. As an experiment, the MuCH model learned concepts from 'Pororo' cartoon videos with a total of 268 minutes in length and generated character-based subtitles. Finally, we compare the results with those of other multimodal learning models. The Experimental results indicate that given the same text query, our model generates more accurate and more character-specific subtitles than other models.
https://doi.org/10.5626/JOK.2015.42.4.451 인용 KSCI

Identify Modal Parameter by The Output Response of Structure Using Smart Sensor System (스마트 센서 시스템을 이용한 구조물의 모달 인자 추출)

Lee, Woo-Sang;Heo, Gwang-Hee;Park, Ki-Tae;Jeon, Joon-Ryong
- Journal of the Korea institute for structural maintenance and inspection
- /
- v.12 no.4
- /
- pp.149-160
- /
- 2008
In this study, the research was carried out on how to identify the modal parameter by acquiring the output response of the structure only through the smart sensor system. The objective of this research is to verify the performance and the on-site adaptability of the smart sensor system that have been actively researched as the advanced measuring system so far. Smart Sensor System was developed so that the real-time dynamic measurement can be performed by means of MEMS-type accelerated sensor, 8 bit CPU, wireless MODEM. In the modal parameter identification test, random excitation was added to the cantilever beam, and then the response of the structure was obtained using the smart sensor system and the wire measurement system respectively. In analyzing the data, modal parameter was identified using NExT & ERA algorithm. Furthermore, the optimal measurement location was selected through EOT algorithm in order to obtain the qualified output response. Result of the test, it was possible to verify the on-site applicability of the smart sensor.
PDF KSCI

Efficient Emotion Classification Method Based on Multimodal Approach Using Limited Speech and Text Data (적은 양의 음성 및 텍스트 데이터를 활용한 멀티 모달 기반의 효율적인 감정 분류 기법)

Mirr Shin;Youhyun Shin
- The Transactions of the Korea Information Processing Society
- /
- v.13 no.4
- /
- pp.174-180
- /
- 2024
In this paper, we explore an emotion classification method through multimodal learning utilizing wav2vec 2.0 and KcELECTRA models. It is known that multimodal learning, which leverages both speech and text data, can significantly enhance emotion classification performance compared to methods that solely rely on speech data. Our study conducts a comparative analysis of BERT and its derivative models, known for their superior performance in the field of natural language processing, to select the optimal model for effective feature extraction from text data for use as the text processing model. The results confirm that the KcELECTRA model exhibits outstanding performance in emotion classification tasks. Furthermore, experiments using datasets made available by AI-Hub demonstrate that the inclusion of text data enables achieving superior performance with less data than when using speech data alone. The experiments show that the use of the KcELECTRA model achieved the highest accuracy of 96.57%. This indicates that multimodal learning can offer meaningful performance improvements in complex natural language processing tasks such as emotion classification.
https://doi.org/10.3745/TKIPS.2024.13.4.174 인용 PDF

A research on feedback effect according to different sensory modality for attention recovery (집중력 회복을 위한 감각 모달리티 별 피드백에 대한 연구)

Hyun, Hye-Jung;Whang, Min-Cheol;Park, Jun-Seok;Lee, Yoon-Joung;Kim, Young-Joo;Kim, Jong-Hwa
- 한국HCI학회:학술대회논문집
- /
- 2007.02a
- /
- pp.137-142
- /
- 2007
한정된 주의력 자원을 회복 시키기 위한 방법 중 피드백이 효과가 있는 것으로 보고되고 있다. 그러나 피드백의 구체적 내용에 대한 집중력 회복의 연구는 미미하다. 본 연구는 집중력을 회복할 수 있는 감각적 자극 중 정서적 피드백 간의 효과 차이를 분석 하였다. 집중력을 평가하기 위한 온라인 실험 시스템을 구축하고 시각, 청각, 촉각 피드백 자극을 제시하여 각각의 감각자극과 집중력 회복의 효과를 분석 하였다. 실험 참여자의 감각 피드백의 선호도에 따른 영향 요인을 알아보기 위하여 실험 후 주관적 설문을 실시하였다. 감각 피드백을 6명의 대학원생에게 일주일에 걸쳐 반복적으로 약200개의 자극을 제시하고 5회 이상 실험을 통하여 얻어진 30회 결과를 분석하였다. 결과적으로 피드백 별 수행 수준에서는 청각, 촉각, 시각 순으로 효과가 높았으며, 반응 시간에서는 촉각, 청각, 시각 순으로 효과적으로 나타났다.
PDF

Search Result 161, Processing Time 0.035 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)