• Title/Summary/Keyword: temporal feature

Search Result 313, Processing Time 0.026 seconds

Robust Speech Recognition Using Weighted Auto-Regressive Moving Average Filter (가중 ARMA 필터를 이용한 강인한 음성인식)

  • Ban, Sung-Min;Kim, Hyung-Soon
    • Phonetics and Speech Sciences
    • /
    • v.2 no.4
    • /
    • pp.145-151
    • /
    • 2010
  • In this paper, a robust feature compensation method is proposed for improving the performance of speech recognition. The proposed method is incorporated into the auto-regressive moving average (ARMA) based feature compensation. We employ variable weights for the ARMA filter according to the degree of speech activity, and pass the normalized cepstral sequence through the weighted ARMA filter. Additionally when normalizing the cepstral sequences in training, the cepstral means and variances are estimated from total training utterances. Experimental results show the proposed method significantly improves the speech recognition performance in the noisy and reverberant environments.

  • PDF

Feature Extraction System for Land Cover Changes Based on Segmentation

  • Jung, Myung-Hee;Yun, Eui-Jung
    • Korean Journal of Remote Sensing
    • /
    • v.20 no.3
    • /
    • pp.207-214
    • /
    • 2004
  • This study focused on providing a methodology to utilize temporal information obtained from remotely sensed data for monitoring a wide variety of targets on the earth's surface. Generally, a methodology in understanding of global changes is composed of mapping, quantifying, and monitoring changes in the physical characteristics of land cover. The selected processing and analysis technique affects the quality of the obtained information. In this research, feature extraction methodology is proposed based on segmentation. It requires a series of processing of multitempotal images: preprocessing of geometric and radiometric correction, image subtraction/thresholding technique, and segmentation/thresholding. It results in the mapping of the change-detected areas. Here, the appropriate methods are studied for each step and especially, in segmentation process, a method to delineate the exact boundaries of features is investigated in multiresolution framework to reduce computational complexity for multitemporal images of large size.

Feature Parameter Extraction and Analysis in the Wavelet Domain for Discrimination of Music and Speech (음악과 음성 판별을 위한 웨이브렛 영역에서의 특징 파라미터)

  • Kim, Jung-Min;Bae, Keun-Sung
    • MALSORI
    • /
    • no.61
    • /
    • pp.63-74
    • /
    • 2007
  • Discrimination of music and speech from the multimedia signal is an important task in audio coding and broadcast monitoring systems. This paper deals with the problem of feature parameter extraction for discrimination of music and speech. The wavelet transform is a multi-resolution analysis method that is useful for analysis of temporal and spectral properties of non-stationary signals such as speech and audio signals. We propose new feature parameters extracted from the wavelet transformed signal for discrimination of music and speech. First, wavelet coefficients are obtained on the frame-by-frame basis. The analysis frame size is set to 20 ms. A parameter $E_{sum}$ is then defined by adding the difference of magnitude between adjacent wavelet coefficients in each scale. The maximum and minimum values of $E_{sum}$ for period of 2 seconds, which corresponds to the discrimination duration, are used as feature parameters for discrimination of music and speech. To evaluate the performance of the proposed feature parameters for music and speech discrimination, the accuracy of music and speech discrimination is measured for various types of music and speech signals. In the experiment every 2-second data is discriminated as music or speech, and about 93% of music and speech segments have been successfully detected.

  • PDF

Human Activity Recognition Based on 3D Residual Dense Network

  • Park, Jin-Ho;Lee, Eung-Joo
    • Journal of Korea Multimedia Society
    • /
    • v.23 no.12
    • /
    • pp.1540-1551
    • /
    • 2020
  • Aiming at the problem that the existing human behavior recognition algorithm cannot fully utilize the multi-level spatio-temporal information of the network, a human behavior recognition algorithm based on a dense three-dimensional residual network is proposed. First, the proposed algorithm uses a dense block of three-dimensional residuals as the basic module of the network. The module extracts the hierarchical features of human behavior through densely connected convolutional layers; Secondly, the local feature aggregation adaptive method is used to learn the local dense features of human behavior; Then, the residual connection module is applied to promote the flow of feature information and reduced the difficulty of training; Finally, the multi-layer local feature extraction of the network is realized by cascading multiple three-dimensional residual dense blocks, and use the global feature aggregation adaptive method to learn the features of all network layers to realize human behavior recognition. A large number of experimental results on benchmark datasets KTH show that the recognition rate (top-l accuracy) of the proposed algorithm reaches 93.52%. Compared with the three-dimensional convolutional neural network (C3D) algorithm, it has improved by 3.93 percentage points. The proposed algorithm framework has good robustness and transfer learning ability, and can effectively handle a variety of video behavior recognition tasks.

Spatio-Temporal Query Processing System based on GML for The Mobile Environment (모바일 환경을 위한 GML 기반 시공간 질의 처리 시스템)

  • Kim, Joung-Joon;Shin, In-Su;Won, Seung-Ho;Lee, Ki-Young;Han, Ki-Joon
    • Spatial Information Research
    • /
    • v.20 no.3
    • /
    • pp.95-106
    • /
    • 2012
  • Recently, with increase and development of the wireless access network area, u-GIS Service is supported in various fields. Especially, spatio-temporal data is used in the mobile environment for the u-GIS service. However, there is no standard for the spatio-temporal data used in different spaces, spatio-temporal data processing technology is necessary to makes interoperability among mobile u-GIS services. Furthermore, it is also necessary to develop the system of gathering, storing, and managing the spatio-temporal data in consideration of small capacity and low performance of mobile devices. Therefore, in this paper, we designed and implemented a spatio-temporal query processing system based on GML to manage spatio-temporal data efficiently in the mobile environment. The spatio-temporal query processing system based on GML can offer a structured storage method which maps a GML schema to a storage table and a binary XML storage method which uses the Fast Infoset technique, so as to support interoperability that is an important feature of GML and increase storage efficiency. we can also provide spatio-temporal operators for rapid query processing of spatio-temporal data of GML documents. In addition, we proved that this system can be utilized for the u-GIS service to implement a virtual scenario.

Video Signature using Spatio-Temporal Information for Video Copy Detection (동영상 복사본 검출을 위한 시공간 정보를 이용한 동영상 서명 - 동심원 구획 기반 서술자를 이용한 동영상 복사본 검출 기술)

  • Cho, Ik-Hwan;Oh, Weon-Geun;Jeong, Dong-Seok
    • 한국HCI학회:학술대회논문집
    • /
    • 2008.02a
    • /
    • pp.607-611
    • /
    • 2008
  • This paper proposes new video signature using spatio-temporal information for copy detection. The proposed video copy detection method is based on concentric circle partitioning method for each key frame. Firstly, key frames are extracted from whole video using temporal bilinear interpolation periodically and each frame is partitioned as a shape of concentric circle. For the partitioned sub-regions, 4 feature distributions of average intensity, its difference, symmetric difference and circular difference distributions are obtained by using the relation between the sub-regions. Finally these feature distributions are converted into binary signature by using simple hash function and merged together. For the proposed video signature, the similarity distance is calculated by simple Hamming distance so that its matching speed is very fast. From experiment results, the proposed method shows high detection success ratio of average 97.4% for various modifications. Therefore it is expected that the proposed method can be utilized for video copy detection widely.

  • PDF

A Recognition Framework for Facial Expression by Expression HMM and Posterior Probability (표정 HMM과 사후 확률을 이용한 얼굴 표정 인식 프레임워크)

  • Kim, Jin-Ok
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.11 no.3
    • /
    • pp.284-291
    • /
    • 2005
  • I propose a framework for detecting, recognizing and classifying facial features based on learned expression patterns. The framework recognizes facial expressions by using PCA and expression HMM(EHMM) which is Hidden Markov Model (HMM) approach to represent the spatial information and the temporal dynamics of the time varying visual expression patterns. Because the low level spatial feature extraction is fused with the temporal analysis, a unified spatio-temporal approach of HMM to common detection, tracking and classification problems is effective. The proposed recognition framework is accomplished by applying posterior probability between current visual observations and previous visual evidences. Consequently, the framework shows accurate and robust results of recognition on as well simple expressions as basic 6 facial feature patterns. The method allows us to perform a set of important tasks such as facial-expression recognition, HCI and key-frame extraction.

Compression Method for MPEG CDVA Global Feature Descriptors (MPEG CDVA 전역 특징 서술자 압축 방법)

  • Kim, Joonsoo;Jo, Won;Lim, Guentaek;Yun, Joungil;Kwak, Sangwoon;Jung, Soon-heung;Cheong, Won-Sik;Choo, Hyon-Gon;Seo, Jeongil;Choi, Yukyung
    • Journal of Broadcast Engineering
    • /
    • v.27 no.3
    • /
    • pp.295-307
    • /
    • 2022
  • In this paper, we propose a novel compression method for scalable Fisher vectors (SCFV) which is used as a global visual feature description of individual video frames in MPEG CDVA standard. CDVA standard has adopted a temporal descriptor redundancy removal technique that takes advantage of the correlation between global feature descriptors for adjacent keyframes. However, due to the variable length property of SCFV, the temporal redundancy removal scheme often results in inferior compression efficiency. It is even worse than the case when the SCFVs are not compressed at all. To enhance the compression efficiency, we propose an asymmetric SCFV difference computation method and a SCFV reconstruction method. Experiments on the FIVR dataset show that the proposed method significantly improves the compression efficiency compared to the original CDVA Experimental Model implementation.

Study of Emotion Recognition based on Facial Image for Emotional Rehabilitation Biofeedback (정서재활 바이오피드백을 위한 얼굴 영상 기반 정서인식 연구)

  • Ko, Kwang-Eun;Sim, Kwee-Bo
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.16 no.10
    • /
    • pp.957-962
    • /
    • 2010
  • If we want to recognize the human's emotion via the facial image, first of all, we need to extract the emotional features from the facial image by using a feature extraction algorithm. And we need to classify the emotional status by using pattern classification method. The AAM (Active Appearance Model) is a well-known method that can represent a non-rigid object, such as face, facial expression. The Bayesian Network is a probability based classifier that can represent the probabilistic relationships between a set of facial features. In this paper, our approach to facial feature extraction lies in the proposed feature extraction method based on combining AAM with FACS (Facial Action Coding System) for automatically modeling and extracting the facial emotional features. To recognize the facial emotion, we use the DBNs (Dynamic Bayesian Networks) for modeling and understanding the temporal phases of facial expressions in image sequences. The result of emotion recognition can be used to rehabilitate based on biofeedback for emotional disabled.

Simulation Study for Feature Identification of Dynamic Medical Image Reconstruction Technique Based on Singular Value Decomposition (특이값분해 기반 동적의료영상 재구성기법의 특징 파악을 위한 시뮬레이션 연구)

  • Kim, Do-Hui;Jung, YoungJin
    • Journal of radiological science and technology
    • /
    • v.42 no.2
    • /
    • pp.119-130
    • /
    • 2019
  • Positron emission tomography (PET) is widely used imaging modality for effective and accurate functional testing and medical diagnosis using radioactive isotopes. However, PET has difficulties in acquiring images with high image quality due to constraints such as the amount of radioactive isotopes injected into the patient, the detection time, the characteristics of the detector, and the patient's motion. In order to overcome this problem, we have succeeded to improve the image quality by using the dynamic image reconstruction method based on singular value decomposition. However, there is still some question about the characteristics of the proposed technique. In this study, the characteristics of reconstruction method based on singular value decomposition was estimated over computational simulation. As a result, we confirmed that the singular value decomposition based reconstruction technique distinguishes the images well when the signal - to - noise ratio of the input image is more than 20 decibels and the feature vector angle is more than 60 degrees. In addition, the proposed methode to estimate the characteristics of reconstruction technique can be applied to other spatio-temporal feature based dynamic image reconstruction techniques. The deduced conclusion of this study can be useful guideline to apply medical image into SVD based dynamic image reconstruction technique to improve the accuracy of medical diagnosis.