Search | Korea Science

Steganography based Multi-modal Biometrics System

Go, Hyoun-Joo;Chun, Myung-Geun
- International Journal of Fuzzy Logic and Intelligent Systems
- /
- v.7 no.2
- /
- pp.148-153
- /
- 2007
This paper deals with implementing a steganography based multi-modal biometric system. For this purpose, we construct a multi-biometrics system based on the face and iris recognition. Here, the feature vector of iris pattern is hidden in the face image. The recognition system is designed by the fuzzy-based Linear Discriminant Analysis(LDA), which is an expanded approach of the LDA method combined by the theory of fuzzy sets. Furthermore, we present a watermarking method that can embed iris information into face images. Finally, we show the advantages of the proposed watermarking scheme by computing the ROC curves and make some comparisons recognition rates of watermarked face images with those of original ones. From various experiments, we found that our proposed scheme could be used for establishing efficient and secure multi-modal biometric systems.
https://doi.org/10.5391/IJFIS.2007.7.2.148 인용 PDF KSCI

Steganography based Multi-modal Biometrics System

Go, Hyoun-Joo;Moon, Dae-Sung;Moon, Ki-Young;Chun, Myung-Geun
- International Journal of Fuzzy Logic and Intelligent Systems
- /
- v.7 no.1
- /
- pp.71-76
- /
- 2007
This paper deals with implementing a steganography based multi-modal biometric system. For this purpose, we construct a multi-biometrics system based on the face and iris recognition. Here, the feature vector of iris pattern is hidden in the face image. The recognition system is designed by the fuzzy-based Linear Discriminant Analysis(LDA), which is an expanded approach of the LDA method combined by the theory of fuzzy sets. Furthermore, we present a watermarking method that can embed iris information into face images. Finally, we show the advantages of the proposed watermarking scheme by computing the ROC curves and make some comparisons recognition rates of watermarked face images with those of original ones. From various experiments, we found that our proposed scheme could be used for establishing efficient and secure multi-modal biometric systems.
https://doi.org/10.5391/IJFIS.2007.7.1.071 인용 PDF KSCI

FakedBits- Detecting Fake Information on Social Platforms using Multi-Modal Features

Dilip Kumar, Sharma;Bhuvanesh, Singh;Saurabh, Agarwal;Hyunsung, Kim;Raj, Sharma
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.17 no.1
- /
- pp.51-73
- /
- 2023
Social media play a significant role in communicating information across the globe, connecting with loved ones, getting the news, communicating ideas, etc. However, a group of people uses social media to spread fake information, which has a bad impact on society. Therefore, minimizing fake news and its detection are the two primary challenges that need to be addressed. This paper presents a multi-modal deep learning technique to address the above challenges. The proposed modal can use and process visual and textual features. Therefore, it has the ability to detect fake information from visual and textual data. We used EfficientNetB0 and a sentence transformer, respectively, for detecting counterfeit images and for textural learning. Feature embedding is performed at individual channels, whilst fusion is done at the last classification layer. The late fusion is applied intentionally to mitigate the noisy data that are generated by multi-modalities. Extensive experiments are conducted, and performance is evaluated against state-of-the-art methods. Three real-world benchmark datasets, such as MediaEval (Twitter), Weibo, and Fakeddit, are used for experimentation. Result reveals that the proposed modal outperformed the state-of-the-art methods and achieved an accuracy of 86.48%, 82.50%, and 88.80%, respectively, for MediaEval (Twitter), Weibo, and Fakeddit datasets.
https://doi.org/10.3837/tiis.2023.01.004 인용 PDF HTML

Generating Radiology Reports via Multi-feature Optimization Transformer

Rui Wang;Rong Hua
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.17 no.10
- /
- pp.2768-2787
- /
- 2023
As an important research direction of the application of computer science in the medical field, the automatic generation technology of radiology report has attracted wide attention in the academic community. Because the proportion of normal regions in radiology images is much larger than that of abnormal regions, words describing diseases are often masked by other words, resulting in significant feature loss during the calculation process, which affects the quality of generated reports. In addition, the huge difference between visual features and semantic features causes traditional multi-modal fusion method to fail to generate long narrative structures consisting of multiple sentences, which are required for medical reports. To address these challenges, we propose a multi-feature optimization Transformer (MFOT) for generating radiology reports. In detail, a multi-dimensional mapping attention (MDMA) module is designed to encode the visual grid features from different dimensions to reduce the loss of primary features in the encoding process; a feature pre-fusion (FP) module is constructed to enhance the interaction ability between multi-modal features, so as to generate a reasonably structured radiology report; a detail enhanced attention (DEA) module is proposed to enhance the extraction and utilization of key features and reduce the loss of key features. In conclusion, we evaluate the performance of our proposed model against prevailing mainstream models by utilizing widely-recognized radiology report datasets, namely IU X-Ray and MIMIC-CXR. The experimental outcomes demonstrate that our model achieves SOTA performance on both datasets, compared with the base model, the average improvement of six key indicators is 19.9% and 18.0% respectively. These findings substantiate the efficacy of our model in the domain of automated radiology report generation.
https://doi.org/10.3837/tiis.2023.10.010 인용 PDF HTML

Improving Transformer with Dynamic Convolution and Shortcut for Video-Text Retrieval

Liu, Zhi;Cai, Jincen;Zhang, Mengmeng
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.16 no.7
- /
- pp.2407-2424
- /
- 2022
Recently, Transformer has made great progress in video retrieval tasks due to its high representation capability. For the structure of a Transformer, the cascaded self-attention modules are capable of capturing long-distance feature dependencies. However, the local feature details are likely to have deteriorated. In addition, increasing the depth of the structure is likely to produce learning bias in the learned features. In this paper, an improved Transformer structure named TransDCS (Transformer with Dynamic Convolution and Shortcut) is proposed. A Multi-head Conv-Self-Attention module is introduced to model the local dependencies and improve the efficiency of local features extraction. Meanwhile, the augmented shortcuts module based on a dual identity matrix is applied to enhance the conduction of input features, and mitigate the learning bias. The proposed model is tested on MSRVTT, LSMDC and Activity-Net benchmarks, and it surpasses all previous solutions for the video-text retrieval task. For example, on the LSMDC benchmark, a gain of about 2.3% MdR and 6.1% MnR is obtained over recently proposed multimodal-based methods.
https://doi.org/10.3837/tiis.2022.07.016 인용 PDF KSCI HTML

Emotion Recognition Algorithm Based on Minimum Classification Error incorporating Multi-modal System (최소 분류 오차 기법과 멀티 모달 시스템을 이용한 감정 인식 알고리즘)

Lee, Kye-Hwan;Chang, Joon-Hyuk
- Journal of the Institute of Electronics Engineers of Korea SP
- /
- v.46 no.4
- /
- pp.76-81
- /
- 2009
We propose an effective emotion recognition algorithm based on the minimum classification error (MCE) incorporating multi-modal system The emotion recognition is performed based on a Gaussian mixture model (GMM) based on MCE method employing on log-likelihood. In particular, the reposed technique is based on the fusion of feature vectors based on voice signal and galvanic skin response (GSR) from the body sensor. The experimental results indicate that performance of the proposal approach based on MCE incorporating the multi-modal system outperforms the conventional approach.
PDF KSCI

Enhancing Recommender Systems by Fusing Diverse Information Sources through Data Transformation and Feature Selection

Thi-Linh Ho;Anh-Cuong Le;Dinh-Hong Vu
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.17 no.5
- /
- pp.1413-1432
- /
- 2023
Recommender systems aim to recommend items to users by taking into account their probable interests. This study focuses on creating a model that utilizes multiple sources of information about users and items by employing a multimodality approach. The study addresses the task of how to gather information from different sources (modalities) and transform them into a uniform format, resulting in a multi-modal feature description for users and items. This work also aims to transform and represent the features extracted from different modalities so that the information is in a compatible format for integration and contains important, useful information for the prediction model. To achieve this goal, we propose a novel multi-modal recommendation model, which involves extracting latent features of users and items from a utility matrix using matrix factorization techniques. Various transformation techniques are utilized to extract features from other sources of information such as user reviews, item descriptions, and item categories. We also proposed the use of Principal Component Analysis (PCA) and Feature Selection techniques to reduce the data dimension and extract important features as well as remove noisy features to increase the accuracy of the model. We conducted several different experimental models based on different subsets of modalities on the MovieLens and Amazon sub-category datasets. According to the experimental results, the proposed model significantly enhances the accuracy of recommendations when compared to SVD, which is acknowledged as one of the most effective models for recommender systems. Specifically, the proposed model reduces the RMSE by a range of 4.8% to 21.43% and increases the Precision by a range of 2.07% to 26.49% for the Amazon datasets. Similarly, for the MovieLens dataset, the proposed model reduces the RMSE by 45.61% and increases the Precision by 14.06%. Additionally, the experimental results on both datasets demonstrate that combining information from multiple modalities in the proposed model leads to superior outcomes compared to relying on a single type of information.
https://doi.org/10.3837/tiis.2023.05.006 인용 PDF HTML

Effective Multi-Modal Feature Fusion for 3D Semantic Segmentation with Multi-View Images (멀티-뷰 영상들을 활용하는 3차원 의미적 분할을 위한 효과적인 멀티-모달 특징 융합)

Hye-Lim Bae;Incheol Kim
- KIPS Transactions on Software and Data Engineering
- /
- v.12 no.12
- /
- pp.505-518
- /
- 2023
3D point cloud semantic segmentation is a computer vision task that involves dividing the point cloud into different objects and regions by predicting the class label of each point. Existing 3D semantic segmentation models have some limitations in performing sufficient fusion of multi-modal features while ensuring both characteristics of 2D visual features extracted from RGB images and 3D geometric features extracted from point cloud. Therefore, in this paper, we propose MMCA-Net, a novel 3D semantic segmentation model using 2D-3D multi-modal features. The proposed model effectively fuses two heterogeneous 2D visual features and 3D geometric features by using an intermediate fusion strategy and a multi-modal cross attention-based fusion operation. Also, the proposed model extracts context-rich 3D geometric features from input point cloud consisting of irregularly distributed points by adopting PTv2 as 3D geometric encoder. In this paper, we conducted both quantitative and qualitative experiments with the benchmark dataset, ScanNetv2 in order to analyze the performance of the proposed model. In terms of the metric mIoU, the proposed model showed a 9.2% performance improvement over the PTv2 model using only 3D geometric features, and a 12.12% performance improvement over the MVPNet model using 2D-3D multi-modal features. As a result, we proved the effectiveness and usefulness of the proposed model.
https://doi.org/10.3745/KTSDE.2023.12.12.505 인용 PDF

Development of Context Awareness and Service Reasoning Technique for Handicapped People (멀티 모달 감정인식 시스템 기반 상황인식 서비스 추론 기술 개발)

Ko, Kwang-Eun;Sim, Kwee-Bo
- Journal of the Korean Institute of Intelligent Systems
- /
- v.19 no.1
- /
- pp.34-39
- /
- 2009
As a subjective recognition effect, human's emotion has impulsive characteristic and it expresses intentions and needs unconsciously. These are pregnant with information of the context about the ubiquitous computing environment or intelligent robot systems users. Such indicators which can aware the user's emotion are facial image, voice signal, biological signal spectrum and so on. In this paper, we generate the each result of facial and voice emotion recognition by using facial image and voice for the increasing convenience and efficiency of the emotion recognition. Also, we extract the feature which is the best fit information based on image and sound to upgrade emotion recognition rate and implement Multi-Modal Emotion recognition system based on feature fusion. Eventually, we propose the possibility of the ubiquitous computing service reasoning method based on Bayesian Network and ubiquitous context scenario in the ubiquitous computing environment by using result of emotion recognition.
https://doi.org/10.5391/JKIIS.2009.19.1.034 인용 PDF KSCI

Multi-modal Detection of Anchor Shot in News Video (다중모드 특징을 사용한 뉴스 동영상의 앵커 장면 검출 기법)

Yoo, Sung-Yul;Kang, Dong-Wook;Kim, Ki-Doo;Jung, Kyeong-Hoon
- Journal of Broadcast Engineering
- /
- v.12 no.4
- /
- pp.311-320
- /
- 2007
In this paper, an efficient detection algorithm of an anchor shot in news video is presented. We observed the audio visual characteristics of news video and proposed several low level features which are appropriate for detecting an anchor shot in news video. The overall structure of the proposed algorithm is composed of 3 stages: the pause detection, the audio cluster classification, and the matching with motion activity stage. We used the audio features as well as the motion feature in order to improve the indexing accuracy and the simulation results show that the performance of the proposed algorithm is quite satisfactory.
https://doi.org/10.5909/JBE.2007.12.4.311 인용 PDF KSCI

Search Result 30, Processing Time 0.022 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)