• Title/Summary/Keyword: multimodal information transform

Search Result 9, Processing Time 0.019 seconds

A multisource image fusion method for multimodal pig-body feature detection

  • Zhong, Zhen;Wang, Minjuan;Gao, Wanlin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.11
    • /
    • pp.4395-4412
    • /
    • 2020
  • The multisource image fusion has become an active topic in the last few years owing to its higher segmentation rate. To enhance the accuracy of multimodal pig-body feature segmentation, a multisource image fusion method was employed. Nevertheless, the conventional multisource image fusion methods can not extract superior contrast and abundant details of fused image. To superior segment shape feature and detect temperature feature, a new multisource image fusion method was presented and entitled as NSST-GF-IPCNN. Firstly, the multisource images were resolved into a range of multiscale and multidirectional subbands by Nonsubsampled Shearlet Transform (NSST). Then, to superior describe fine-scale texture and edge information, even-symmetrical Gabor filter and Improved Pulse Coupled Neural Network (IPCNN) were used to fuse low and high-frequency subbands, respectively. Next, the fused coefficients were reconstructed into a fusion image using inverse NSST. Finally, the shape feature was extracted using automatic threshold algorithm and optimized using morphological operation. Nevertheless, the highest temperature of pig-body was gained in view of segmentation results. Experiments revealed that the presented fusion algorithm was able to realize 2.102-4.066% higher average accuracy rate than the traditional algorithms and also enhanced efficiency.

Multimodal System by Data Fusion and Synergetic Neural Network

  • Son, Byung-Jun;Lee, Yill-Byung
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.5 no.2
    • /
    • pp.157-163
    • /
    • 2005
  • In this paper, we present the multimodal system based on the fusion of two user-friendly biometric modalities: Iris and Face. In order to reach robust identification and verification we are going to combine two different biometric features. we specifically apply 2-D discrete wavelet transform to extract the feature sets of low dimensionality from iris and face. And then to obtain Reduced Joint Feature Vector(RJFV) from these feature sets, Direct Linear Discriminant Analysis (DLDA) is used in our multimodal system. In addition, the Synergetic Neural Network(SNN) is used to obtain matching score of the preprocessed data. This system can operate in two modes: to identify a particular person or to verify a person's claimed identity. Our results for both cases show that the proposed method leads to a reliable person authentication system.

A Calibration Method for Multimodal dual Camera Environment (멀티모달 다중 카메라의 영상 보정방법)

  • Lim, Su-Chang;Kim, Do-Yeon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.19 no.9
    • /
    • pp.2138-2144
    • /
    • 2015
  • Multimodal dual camera system has a stereo-like configuration equipped with an infrared thermal and optical camera. This paper presents stereo calibration methods on multimodal dual camera system using a target board that can be recognized by both thermal and optical camera. While a typical stereo calibration method usually performed with extracted intrinsic and extrinsic camera parameter, consecutive image processing steps were applied in this paper as follows. Firstly, the corner points were detected from the two images, and then the pixel error rate, the size difference, the rotation degree between the two images were calculated by using the pixel coordinates of detected corner points. Secondly, calibration was performed with the calculated values via affine transform. Lastly, result image was reconstructed with mapping regions on calibrated image.

Enhancement of Mobile Authentication System Performance based on Multimodal Biometrics (다중 생체인식 기반의 모바일 인증 시스템 성능 개선)

  • Jeong, Kanghun;Kim, Sanghoon;Moon, Hyeonjoon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2013.05a
    • /
    • pp.342-345
    • /
    • 2013
  • 본 논문은 모바일 환경에서의 다중생체인식을 통한 개인인증 시스템을 제안한다. 다중생체인식을 위하여 얼굴인식과 화자인식을 선택하였으며, 시스템의 인식 시나리오는 다음을 따른다. 얼굴인식을 위하여 Modified census transform (MCT) 기반의 얼굴검출과 k-means 클러스터 분석 (cluster analysis) 알고리즘 기반의 눈 검출을 통해 얼굴영역 전처리를 수행하고, principal component analysis (PCA) 기반의 얼굴인증 시스템을 구현한다. 화자인식을 위하여 음성의 끝점 추출과 Mel frequency cepstral coefficient(MFCC) 특징을 추출하고, dynamic time warping (DTW) 기반의 화자 인증 시스템을 구현한다. 그리고 각각의 생체인식을 본 논문에서 제안된 방법을 기반으로 융합하여 인식률을 향상시킨다.

Vision-based Walking Guidance System Using Top-view Transform and Beam-ray Model (탑-뷰 변환과 빔-레이 모델을 이용한 영상기반 보행 안내 시스템)

  • Lin, Qing;Han, Young-Joon;Hahn, Hern-Soo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.12
    • /
    • pp.93-102
    • /
    • 2011
  • This paper presents a walking guidance system for blind pedestrians in an outdoor environment using just one single camera. Unlike many existing travel-aid systems that rely on stereo-vision, the proposed system aims to get necessary information of the road environment by using just single camera fixed at the belly of the user. To achieve this goal, a top-view image of the road is used, on which obstacles are detected by first extracting local extreme points and then verified by the polar edge histogram. Meanwhile, user motion is estimated by using optical flow in an area close to the user. Based on these information extracted from image domain, an audio message generation scheme is proposed to deliver guidance instructions via synthetic voice to the blind user. Experiments with several sidewalk video-clips show that the proposed walking guidance system is able to provide useful guidance instructions under certain sidewalk environments.

Environmental IoT-Enabled Multimodal Mashup Service for Smart Forest Fires Monitoring

  • Elmisery, Ahmed M.;Sertovic, Mirela
    • Journal of Multimedia Information System
    • /
    • v.4 no.4
    • /
    • pp.163-170
    • /
    • 2017
  • Internet of things (IoT) is a new paradigm for collecting, processing and analyzing various contents in order to detect anomalies and to monitor particular patterns in a specific environment. The collected data can be used to discover new patterns and to offer new insights. IoT-enabled data mashup is a new technology to combine various types of information from multiple sources into a single web service. Mashup services create a new horizon for different applications. Environmental monitoring is a serious tool for the state and private organizations, which are located in regions with environmental hazards and seek to gain insights to detect hazards and locate them clearly. These organizations may utilize IoT - enabled data mashup service to merge different types of datasets from different IoT sensor networks in order to leverage their data analytics performance and the accuracy of the predictions. This paper presents an IoT - enabled data mashup service, where the multimedia data is collected from the various IoT platforms, then fed into an environmental cognition service which executes different image processing techniques such as noise removal, segmentation, and feature extraction, in order to detect interesting patterns in hazardous areas. The noise present in the captured images is eliminated with the help of a noise removal and background subtraction processes. Markov based approach was utilized to segment the possible regions of interest. The viable features within each region were extracted using a multiresolution wavelet transform, then fed into a discriminative classifier to extract various patterns. Experimental results have shown an accurate detection performance and adequate processing time for the proposed approach. We also provide a data mashup scenario for an IoT-enabled environmental hazard detection service and experimentation results.

Analysis of Research Trends in Deep Learning-Based Video Captioning (딥러닝 기반 비디오 캡셔닝의 연구동향 분석)

  • Lyu Zhi;Eunju Lee;Youngsoo Kim
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.13 no.1
    • /
    • pp.35-49
    • /
    • 2024
  • Video captioning technology, as a significant outcome of the integration between computer vision and natural language processing, has emerged as a key research direction in the field of artificial intelligence. This technology aims to achieve automatic understanding and language expression of video content, enabling computers to transform visual information in videos into textual form. This paper provides an initial analysis of the research trends in deep learning-based video captioning and categorizes them into four main groups: CNN-RNN-based Model, RNN-RNN-based Model, Multimodal-based Model, and Transformer-based Model, and explain the concept of each video captioning model. The features, pros and cons were discussed. This paper lists commonly used datasets and performance evaluation methods in the video captioning field. The dataset encompasses diverse domains and scenarios, offering extensive resources for the training and validation of video captioning models. The model performance evaluation method mentions major evaluation indicators and provides practical references for researchers to evaluate model performance from various angles. Finally, as future research tasks for video captioning, there are major challenges that need to be continuously improved, such as maintaining temporal consistency and accurate description of dynamic scenes, which increase the complexity in real-world applications, and new tasks that need to be studied are presented such as temporal relationship modeling and multimodal data integration.

A Robust Watermarking Algorithm using Wavelet for Biometric Information (웨이블렛을 이용한 생체정보의 강인한 워터마킹 알고리즘)

  • Lee, Wook-Jae;Lee, Dae-Jong;Moon, Ki-Young;Chun, Myung-Geun
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.17 no.5
    • /
    • pp.632-639
    • /
    • 2007
  • This paper presents a wavelet-based watermarking algorithm to securely hide biometric features such as face and fingerprint and effectively extract them with less distortion of the concealed data. To hide the biometric features, we proposed a determination method of insert location based on wavelet transform and adaptive weight method according to the image characteristics. The hidden features are effectively extracted by applying the inverse wavelet transform to the watermarked image. To show the effectiveness, we analyze the various performance such as PSNR and correlation of watermark features before and after applying watermarking. Also, we evaluate the effect of watermaking algorithm with respect to biometric system such as recognition rate. Recognition rate shows 98.67% for multimodal biometric systems consisted of face and fingerprint. From these, we confirm that the proposed method makes it possible to effectively hide and extract the biometric features without lowering recognition rate.

Development of the Multi-Parametric Mapping Software Based on Functional Maps to Determine the Clinical Target Volumes (임상표적체적 결정을 위한 기능 영상 기반 생물학적 인자 맵핑 소프트웨어 개발)

  • Park, Ji-Yeon;Jung, Won-Gyun;Lee, Jeong-Woo;Lee, Kyoung-Nam;Ahn, Kook-Jin;Hong, Se-Mie;Juh, Ra-Hyeong;Choe, Bo-Young;Suh, Tae-Suk
    • Progress in Medical Physics
    • /
    • v.21 no.2
    • /
    • pp.153-164
    • /
    • 2010
  • To determine the clinical target volumes considering vascularity and cellularity of tumors, the software was developed for mapping of the analyzed biological clinical target volumes on anatomical images using regional cerebral blood volume (rCBV) maps and apparent diffusion coefficient (ADC) maps. The program provides the functions for integrated registrations using mutual information, affine transform and non-rigid registration. The registration accuracy is evaluated by the calculation of the overlapped ratio of segmented bone regions and average distance difference of contours between reference and registered images. The performance of the developed software was tested using multimodal images of a patient who has the residual tumor of high grade gliomas. Registration accuracy of about 74% and average 2.3 mm distance difference were calculated by the evaluation method of bone segmentation and contour extraction. The registration accuracy can be improved as higher as 4% by the manual adjustment functions. Advanced MR images are analyzed using color maps for rCBV maps and quantitative calculation based on region of interest (ROI) for ADC maps. Then, multi-parameters on the same voxels are plotted on plane and constitute the multi-functional parametric maps of which x and y axis representing rCBV and ADC values. According to the distributions of functional parameters, tumor regions showing the higher vascularity and cellularity are categorized according to the criteria corresponding malignant gliomas. Determined volumes reflecting pathological and physiological characteristics of tumors are marked on anatomical images. By applying the multi-functional images, errors arising from using one type of image would be reduced and local regions representing higher probability as tumor cells would be determined for radiation treatment plan. Biological tumor characteristics can be expressed using image registration and multi-functional parametric maps in the developed software. The software can be considered to delineate clinical target volumes using advanced MR images with anatomical images.