• Title/Summary/Keyword: feature mismatch

Search Result 39, Processing Time 0.02 seconds

ARMA Filtering of Speech Features Using Energy Based Weights (에너지 기반 가중치를 이용한 음성 특징의 자동회귀 이동평균 필터링)

  • Ban, Sung-Min;Kim, Hyung-Soon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.31 no.2
    • /
    • pp.87-92
    • /
    • 2012
  • In this paper, a robust feature compensation method to deal with the environmental mismatch is proposed. The proposed method applies energy based weights according to the degree of speech presence to the Mean subtraction, Variance normalization, and ARMA filtering (MVA) processing. The weights are further smoothed by the moving average and maximum filters. The proposed feature compensation algorithm is evaluated on AURORA 2 task and distant talking experiment using the robot platform, and we obtain error rate reduction of 14.4 % and 44.9 % by using the proposed algorithm comparing with MVA processing on AURORA 2 task and distant talking experiment, respectively.

Robust Histogram Equalization Using Compensated Probability Distribution

  • Kim, Sung-Tak;Kim, Hoi-Rin
    • MALSORI
    • /
    • v.55
    • /
    • pp.131-142
    • /
    • 2005
  • A mismatch between the training and the test conditions often causes a drastic decrease in the performance of the speech recognition systems. In this paper, non-linear transformation techniques based on histogram equalization in the acoustic feature space are studied for reducing the mismatched condition. The purpose of histogram equalization(HEQ) is to convert the probability distribution of test speech into the probability distribution of training speech. While conventional histogram equalization methods consider only the probability distribution of a test speech, for noise-corrupted test speech, its probability distribution is also distorted. The transformation function obtained by this distorted probability distribution maybe bring about miss-transformation of feature vectors, and this causes the performance of histogram equalization to decrease. Therefore, this paper proposes a new method of calculating noise-removed probability distribution by using assumption that the CDF of noisy speech feature vectors consists of component of speech feature vectors and component of noise feature vectors, and this compensated probability distribution is used in HEQ process. In the AURORA-2 framework, the proposed method reduced the error rate by over $44\%$ in clean training condition compared to the baseline system. For multi training condition, the proposed methods are also better than the baseline system.

  • PDF

Semi-fragile Watermarking Scheme for H.264/AVC Video Content Authentication Based on Manifold Feature

  • Ling, Chen;Ur-Rehman, Obaid;Zhang, Wenjun
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.8 no.12
    • /
    • pp.4568-4587
    • /
    • 2014
  • Authentication of videos and images based on the content is becoming an important problem in information security. Unfortunately, previous studies lack the consideration of Kerckhoffs's principle in order to achieve this (i.e., a cryptosystem should be secure even if everything about the system, except the key, is public knowledge). In this paper, a solution to the problem of finding a relationship between a frame's index and its content is proposed based on the creative utilization of a robust manifold feature. The proposed solution is based on a novel semi-fragile watermarking scheme for H.264/AVC video content authentication. At first, the input I-frame is partitioned for feature extraction and watermark embedding. This is followed by the temporal feature extraction using the Isometric Mapping algorithm. The frame index is included in the feature to produce the temporal watermark. In order to improve security, the spatial watermark will be encrypted together with the temporal watermark. Finally, the resultant watermark is embedded into the Discrete Cosine Transform coefficients in the diagonal positions. At the receiver side, after watermark extraction and decryption, temporal tampering is detected through a mismatch between the frame index extracted from the temporal watermark and the observed frame index. Next, the feature is regenerate through temporal feature regeneration, and compared with the extracted feature. It is judged through the comparison whether the extracted temporal watermark is similar to that of the original watermarked video. Additionally, for spatial authentication, the tampered areas are located via the comparison between extracted and regenerated spatial features. Experimental results show that the proposed method is sensitive to intentional malicious attacks and modifications, whereas it is robust to legitimate manipulations, such as certain level of lossy compression, channel noise, Gaussian filtering and brightness adjustment. Through a comparison between the extracted frame index and the current frame index, the temporal tempering is identified. With the proposed scheme, a solution to the Kerckhoffs's principle problem is specified.

Robust Speech Recognition using Noise Compensation Method Based on Eigen - Environment (Eigen - Environment 잡음 보상 방법을 이용한 강인한 음성인식)

  • Song Hwa Jeon;Kim Hyung Soon
    • MALSORI
    • /
    • no.52
    • /
    • pp.145-160
    • /
    • 2004
  • In this paper, a new noise compensation method based on the eigenvoice framework in feature space is proposed to reduce the mismatch between training and testing environments. The difference between clean and noisy environments is represented by the linear combination of K eigenvectors that represent the variation among environments. In the proposed method, the performance improvement of speech recognition systems is largely affected by how to construct the noisy models and the bias vector set. In this paper, two methods, the one based on MAP adaptation method and the other using stereo DB, are proposed to construct the noisy models. In experiments using Aurora 2 DB, we obtained 44.86% relative improvement with eigen-environment method in comparison with baseline system. Especially, in clean condition training mode, our proposed method yielded 66.74% relative improvement, which is better performance than several methods previously proposed in Aurora project.

  • PDF

Energy Feature Normalization for Robust Speech Recognition in Noisy Environments

  • Lee, Yoon-Jae;Ko, Han-Seok
    • Speech Sciences
    • /
    • v.13 no.1
    • /
    • pp.129-139
    • /
    • 2006
  • In this paper, we propose two effective energy feature normalization methods for robust speech recognition in noisy environments. In the first method, we estimate the noise energy and remove it from the noisy speech energy. In the second method, we propose a modified algorithm for the Log-energy Dynamic Range Normalization (ERN) method. In the ERN method, the log energy of the training data in a clean environment is transformed into the log energy in noisy environments. If the minimum log energy of the test data is outside of a pre-defined range, the log energy of the test data is also transformed. Since the ERN method has several weaknesses, we propose a modified transform scheme designed to reduce the residual mismatch that it produces. In the evaluation conducted on the Aurora2.0 database, we obtained a significant performance improvement.

  • PDF

Robust Feature Normalization Scheme Using Separated Eigenspace in Noisy Environments (분리된 고유공간을 이용한 잡음환경에 강인한 특징 정규화 기법)

  • Lee Yoonjae;Ko Hanseok
    • The Journal of the Acoustical Society of Korea
    • /
    • v.24 no.4
    • /
    • pp.210-216
    • /
    • 2005
  • We Propose a new feature normalization scheme based on eigenspace for achieving robust speech recognition. In general, mean and variance normalization (MVN) is Performed in cepstral domain. However, another MVN approach using eigenspace was recently introduced. in that the eigenspace normalization Procedure Performs normalization in a single eigenspace. This Procedure consists of linear PCA matrix feature transformation followed by mean and variance normalization of the transformed cepstral feature. In this method. 39 dimensional feature distribution is represented using only a single eigenspace. However it is observed to be insufficient to represent all data distribution using only a sin91e eigenvector. For more specific representation. we apply unique na independent eigenspaces to cepstra, delta and delta-delta cepstra respectively in this Paper. We also normalize training data in eigenspace and get the model from the normalized training data. Finally. a feature space rotation procedure is introduced to reduce the mismatch of training and test data distribution in noisy condition. As a result, we obtained a substantial recognition improvement over the basic eigenspace normalization.

Applying feature normalization based on pole filtering to short-utterance speech recognition using deep neural network (심층신경망을 이용한 짧은 발화 음성인식에서 극점 필터링 기반의 특징 정규화 적용)

  • Han, Jaemin;Kim, Min Sik;Kim, Hyung Soon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.39 no.1
    • /
    • pp.64-68
    • /
    • 2020
  • In a conventional speech recognition system using Gaussian Mixture Model-Hidden Markov Model (GMM-HMM), the cepstral feature normalization method based on pole filtering was effective in improving the performance of recognition of short utterances in noisy environments. In this paper, the usefulness of this method for the state-of-the-art speech recognition system using Deep Neural Network (DNN) is examined. Experimental results on AURORA 2 DB show that the cepstral mean and variance normalization based on pole filtering improves the recognition performance of very short utterances compared to that without pole filtering, especially when there is a large mismatch between the training and test conditions.

Implementation of a Robust Speech Recognizer in Noisy Car Environment Using a DSP (DSP를 이용한 자동차 소음에 강인한 음성인식기 구현)

  • Chung, Ik-Joo
    • Speech Sciences
    • /
    • v.15 no.2
    • /
    • pp.67-77
    • /
    • 2008
  • In this paper, we implemented a robust speech recognizer using the TMS320VC33 DSP. For this implementation, we had built speech and noise database suitable for the recognizer using spectral subtraction method for noise removal. The recognizer has an explicit structure in aspect that a speech signal is enhanced through spectral subtraction before endpoints detection and feature extraction. This helps make the operation of the recognizer clear and build HMM models which give minimum model-mismatch. Since the recognizer was developed for the purpose of controlling car facilities and voice dialing, it has two recognition engines, speaker independent one for controlling car facilities and speaker dependent one for voice dialing. We adopted a conventional DTW algorithm for the latter and a continuous HMM for the former. Though various off-line recognition test, we made a selection of optimal conditions of several recognition parameters for a resource-limited embedded recognizer, which led to HMM models of the three mixtures per state. The car noise added speech database is enhanced using spectral subtraction before HMM parameter estimation for reducing model-mismatch caused by nonlinear distortion from spectral subtraction. The hardware module developed includes a microcontroller for host interface which processes the protocol between the DSP and a host.

  • PDF

New Blind Steganalysis Framework Combining Image Retrieval and Outlier Detection

  • Wu, Yunda;Zhang, Tao;Hou, Xiaodan;Xu, Chen
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.12
    • /
    • pp.5643-5656
    • /
    • 2016
  • The detection accuracy of steganalysis depends on many factors, including the embedding algorithm, the payload size, the steganalysis feature space and the properties of the cover source. In practice, the cover source mismatch (CSM) problem has been recognized as the single most important factor negatively affecting the performance. To address this problem, we propose a new framework for blind, universal steganalysis which uses traditional steganalyst features. Firstly, cover images with the same statistical properties are searched from a reference image database as aided samples. The test image and its aided samples form a whole test set. Then, by assuming that most of the aided samples are innocent, we conduct outlier detection on the test set to judge the test image as cover or stego. In this way, the framework has removed the need for training. Hence, it does not suffer from cover source mismatch. Because it performs anomaly detection rather than classification, this method is totally unsupervised. The results in our study show that this framework works superior than one-class support vector machine and the outlier detector without considering the image retrieval process.

Deformation estimation of truss bridges using two-stage optimization from cameras

  • Jau-Yu Chou;Chia-Ming Chang
    • Smart Structures and Systems
    • /
    • v.31 no.4
    • /
    • pp.409-419
    • /
    • 2023
  • Structural integrity can be accessed from dynamic deformations of structures. Moreover, dynamic deformations can be acquired from non-contact sensors such as video cameras. Kanade-Lucas-Tomasi (KLT) algorithm is one of the commonly used methods for motion tracking. However, averaging throughout the extracted features would induce bias in the measurement. In addition, pixel-wise measurements can be converted to physical units through camera intrinsic. Still, the depth information is unreachable without prior knowledge of the space information. The assigned homogeneous coordinates would then mismatch manually selected feature points, resulting in measurement errors during coordinate transformation. In this study, a two-stage optimization method for video-based measurements is proposed. The manually selected feature points are first optimized by minimizing the errors compared with the homogeneous coordinate. Then, the optimized points are utilized for the KLT algorithm to extract displacements through inverse projection. Two additional criteria are employed to eliminate outliers from KLT, resulting in more reliable displacement responses. The second-stage optimization subsequently fine-tunes the geometry of the selected coordinates. The optimization process also considers the number of interpolation points at different depths of an image to reduce the effect of out-of-plane motions. As a result, the proposed method is numerically investigated by using a truss bridge as a physics-based graphic model (PBGM) to extract high-accuracy displacements from recorded videos under various capturing angles and structural conditions.