• Title/Summary/Keyword: Multimodal Network

Search Result 74, Processing Time 0.033 seconds

A Study on User Experience Factors of Display-Type Artificial Intelligence Speakers through Semantic Network Analysis : Focusing on Online Review Analysis of the Amazon Echo (의미연결망 분석을 통한 디스플레이형 인공지능 스피커의 사용자 경험 요인 연구 : 아마존 에코의 온라인 리뷰 분석을 중심으로)

  • Lee, Jeongmyeong;Kim, Hyesun;Choi, Junho
    • The Journal of the Convergence on Culture Technology
    • /
    • v.5 no.3
    • /
    • pp.9-23
    • /
    • 2019
  • The artificial intelligence speaker market is in a new age of mounting displays. This study aimed to analyze the difference of experience using artificial intelligent speakers in terms of usage context, according to the presence or absence of displays. This was achieved by using semantic network analysis to determine how the online review texts of Amazon Echo Show and Echo Plus consisted of different UX issues with structural differences. Based on the physical context and the social context of the user experience, the ego network was constructed to draw out major issues. Results of the analysis show that users' expectation gap is generated according to the display presence, which can lead to negative experiences. Also, it was confirmed that the Multimodal interface is more utilized in the kitchen than in the bedroom, and can contribute to the activation of communication among family members. Based on these findings, we propose a user experience strategy to be considered in display type speakers to be launched in Korea in the future.

Design of a Deep Neural Network Model for Image Caption Generation (이미지 캡션 생성을 위한 심층 신경망 모델의 설계)

  • Kim, Dongha;Kim, Incheol
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.4
    • /
    • pp.203-210
    • /
    • 2017
  • In this paper, we propose an effective neural network model for image caption generation and model transfer. This model is a kind of multi-modal recurrent neural network models. It consists of five distinct layers: a convolution neural network layer for extracting visual information from images, an embedding layer for converting each word into a low dimensional feature, a recurrent neural network layer for learning caption sentence structure, and a multi-modal layer for combining visual and language information. In this model, the recurrent neural network layer is constructed by LSTM units, which are well known to be effective for learning and transferring sequence patterns. Moreover, this model has a unique structure in which the output of the convolution neural network layer is linked not only to the input of the initial state of the recurrent neural network layer but also to the input of the multimodal layer, in order to make use of visual information extracted from the image at each recurrent step for generating the corresponding textual caption. Through various comparative experiments using open data sets such as Flickr8k, Flickr30k, and MSCOCO, we demonstrated the proposed multimodal recurrent neural network model has high performance in terms of caption accuracy and model transfer effect.

Global Function Approximations Using Wavelet Neural Networks (웨이블렛 신경망을 이용한 전역근사 메타모델의 성능비교)

  • Shin, Kwang-Ho;Lee, Jong-Soo
    • Transactions of the Korean Society of Mechanical Engineers A
    • /
    • v.33 no.8
    • /
    • pp.753-759
    • /
    • 2009
  • Feed-forward neural networks have been widely used as function approximation tools in the context of global approximate optimization. In the present study, a wavelet neural network (WNN) which is based on wavelet transform theory is suggested as an alternative to a traditional back-propagation neural network (BPN). The basic theory of wavelet neural network is briefly described, and approximation performance is tested using a nonlinear multimodal function and a composite rotor blade analysis problem. Laplacian of Gaussian function, Mexican function, and Morlet function are considered during the construction of WNN architectures. In addition, approximation results from WNN are compared with those from BPN.

A study on the implementation of identification system using facial multi-modal (얼굴의 다중특징을 이용한 인증 시스템 구현)

  • 정택준;문용선
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.6 no.5
    • /
    • pp.777-782
    • /
    • 2002
  • This study will offer multimodal recognition instead of an existing monomodal bioinfomatics by using facial multi-feature to improve the accuracy of recognition and to consider the convenience of user . Each bioinfomatics vector can be found by the following ways. For a face, the feature is calculated by principal component analysis with wavelet multiresolution. For a lip, a filter is used to find out an equation to calculate the edges of the lips first. Then by using a thinning image and least square method, an equation factor can be drawn. A feature found out the facial parameter distance ratio. We've sorted backpropagation neural network and experimented with the inputs used above. Based on the experimental results we discuss the advantage and efficiency.

Multimodal audiovisual speech recognition architecture using a three-feature multi-fusion method for noise-robust systems

  • Sanghun Jeon;Jieun Lee;Dohyeon Yeo;Yong-Ju Lee;SeungJun Kim
    • ETRI Journal
    • /
    • v.46 no.1
    • /
    • pp.22-34
    • /
    • 2024
  • Exposure to varied noisy environments impairs the recognition performance of artificial intelligence-based speech recognition technologies. Degraded-performance services can be utilized as limited systems that assure good performance in certain environments, but impair the general quality of speech recognition services. This study introduces an audiovisual speech recognition (AVSR) model robust to various noise settings, mimicking human dialogue recognition elements. The model converts word embeddings and log-Mel spectrograms into feature vectors for audio recognition. A dense spatial-temporal convolutional neural network model extracts features from log-Mel spectrograms, transformed for visual-based recognition. This approach exhibits improved aural and visual recognition capabilities. We assess the signal-to-noise ratio in nine synthesized noise environments, with the proposed model exhibiting lower average error rates. The error rate for the AVSR model using a three-feature multi-fusion method is 1.711%, compared to the general 3.939% rate. This model is applicable in noise-affected environments owing to its enhanced stability and recognition rate.

A Decision Support System for an Optimal Transportation Network Planning in the Third Party Logistics

  • Park, Yong-Sung;Choi, Hyung-Rim;Kim, Hyun-Soo;Park, Nam-Kyu;Cho, Jae-Hyung;Gang, Moo-Hong
    • Proceedings of the Korean Institute of Navigation and Port Research Conference
    • /
    • 2006.10a
    • /
    • pp.240-257
    • /
    • 2006
  • In an effort to gain competitiveness, recently many companies are trying to outsource their logistics activities to the logistics specialists, while concentrating on their core and strategic business area. Because of this trend, the third party logistics comes to the fore, catching people's attention, and expanding its market rapidly. Under these circumstances, the third party logistics companies are making every effort to improve their logistics services and to develop an information system in order to enhance their competitiveness. In particular, among these efforts one of the critical parts is the decision support system for effective transportation network planning. To this end, this study has developed an efficient decision support system for an optimal transportation network planning by comprehensively considering the transportation mode, routing, assignment, and schedule. As a result of this study, the new system enables the expansion of the third party logistics companies' services including the multimodal transportation, not to mention one mode of transportation, and also gets them ready to plan an international transportation network.

  • PDF

TANFIS Classifier Integrated Efficacious Aassistance System for Heart Disease Prediction using CNN-MDRP

  • Bhaskaru, O.;Sreedevi, M.
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.10
    • /
    • pp.171-176
    • /
    • 2022
  • A dramatic rise in the number of people dying from heart disease has prompted efforts to find a way to identify it sooner using efficient approaches. A variety of variables contribute to the condition and even hereditary factors. The current estimate approaches use an automated diagnostic system that fails to attain a high level of accuracy because it includes irrelevant dataset information. This paper presents an effective neural network with convolutional layers for classifying clinical data that is highly class-imbalanced. Traditional approaches rely on massive amounts of data rather than precise predictions. Data must be picked carefully in order to achieve an earlier prediction process. It's a setback for analysis if the data obtained is just partially complete. However, feature extraction is a major challenge in classification and prediction since increased data increases the training time of traditional machine learning classifiers. The work integrates the CNN-MDRP classifier (convolutional neural network (CNN)-based efficient multimodal disease risk prediction with TANFIS (tuned adaptive neuro-fuzzy inference system) for earlier accurate prediction. Perform data cleaning by transforming partial data to informative data from the dataset in this project. The recommended TANFIS tuning parameters are then improved using a Laplace Gaussian mutation-based grasshopper and moth flame optimization approach (LGM2G). The proposed approach yields a prediction accuracy of 98.40 percent when compared to current algorithms.

A Link-Based Shortest Path Algorithm for the Urban Intermodal Transportation Network with Time-Schedule Constraints (서비스시간 제약이 존재하는 도시부 복합교통망을 위한 링크기반의 최단경로탐색 알고리즘)

  • 장인성
    • Journal of Korean Society of Transportation
    • /
    • v.18 no.6
    • /
    • pp.111-124
    • /
    • 2000
  • 본 연구에서 다루고자 하는 문제는 서비스시간 제약을 갖는 도시부 복합교통망에서의 기종점을 잇는 합리적인 최단경로를 탐색하고자 하는 것이다. 서비스시간 제약은 도시부 복합교통망에서의 현실성을 보다 더 사실적으로 표현하지만 기존의 알고리즘들은 이를 고려하지 않고 있다. 서비스시간 제약은 환승역에서 여행자가 환승차량을 이용해서 다른 지점으로 여행할 수 있는 출발시간이 미리 계획된 차량운행시간들에 의해 제한되어지는 것이다. 환승역에 도착한 여행자는 환승차량의 정해진 운행시간에서만 환승차량을 이용해서 다른 지점으로 여행할 수 있다. 따라서 서비스시간 제약이 고려되어지는 경우 총소요시간에는 여행시간과 환승대기시간이 포함되어지고, 환승대기시간은 여행자가 환승역에 도착한 시간과 환승차량의 출발이 허용되어지는 시간에 의존해서 변한다. 본 논문에서는 이러한 문제를 해결할 수 있는 링크기반의 최단경로탐색 알고리즘을 개발하였다. Dijkstra 알고리즘과 같은 전통적인 탐색법에서는 각 노드까지의 최단도착시간을 계산하여 각 노드에 표지로 설정하지만 제안된 알고리즘에서는 각 링크가지의 최단도착시간과 각 링크에서의 가장 빠른 출발시간을 계산하여 각 링크의 표지로 설정한다. 제안된 알고리즘의 자세한 탐색과정이 간단한 복합교통망에 대하여 예시되어진다.

  • PDF

Freight Operation System for Rail-Road Intermodal and Multimodal Transportation (철도 연계 및 복합운송을 위한 물류운영시스템)

  • 문대섭;정병현;조혜진
    • Proceedings of the KSR Conference
    • /
    • 2002.10a
    • /
    • pp.307-313
    • /
    • 2002
  • Due to the fact that high speed railroad will be opened to traffic and TSR and TCR will be connected to Korean railroad network in near future, the efficient operation methods of Railroad freight transportation have been embossed as a matter of concern. But, it is true that the present level of our railway infrastructures and operation systems are still low. This study examines the freight transportation system and introduces new logistic service systems including network strategies and terminals which are haying developed overseas to increase the efficiency of existing railroad transportation system. In case of giving careful consideration to the new methods, it is expected that the high developed freight transportation systems will be applicable to our country.

  • PDF

Tumor Segmentation in Multimodal Brain MRI Using Deep Learning Approaches

  • Al Shehri, Waleed;Jannah, Najlaa
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.8
    • /
    • pp.343-351
    • /
    • 2022
  • A brain tumor forms when some tissue becomes old or damaged but does not die when it must, preventing new tissue from being born. Manually finding such masses in the brain by analyzing MRI images is challenging and time-consuming for experts. In this study, our main objective is to detect the brain's tumorous part, allowing rapid diagnosis to treat the primary disease instantly. With image processing techniques and deep learning prediction algorithms, our research makes a system capable of finding a tumor in MRI images of a brain automatically and accurately. Our tumor segmentation adopts the U-Net deep learning segmentation on the standard MICCAI BRATS 2018 dataset, which has MRI images with different modalities. The proposed approach was evaluated and achieved Dice Coefficients of 0.9795, 0.9855, 0.9793, and 0.9950 across several test datasets. These results show that the proposed system achieves excellent segmentation of tumors in MRIs using deep learning techniques such as the U-Net algorithm.