• Title/Summary/Keyword: noisy data

Search Result 420, Processing Time 0.029 seconds

Issues and Empirical Results for Improving Text Classification

  • Ko, Young-Joong;Seo, Jung-Yun
    • Journal of Computing Science and Engineering
    • /
    • v.5 no.2
    • /
    • pp.150-160
    • /
    • 2011
  • Automatic text classification has a long history and many studies have been conducted in this field. In particular, many machine learning algorithms and information retrieval techniques have been applied to text classification tasks. Even though much technical progress has been made in text classification, there is still room for improvement in text classification. In this paper, we will discuss remaining issues in improving text classification. In this paper, three improvement issues are presented including automatic training data generation, noisy data treatment and term weighting and indexing, and four actual studies and their empirical results for those issues are introduced. First, the semi-supervised learning technique is applied to text classification to efficiently create training data. For effective noisy data treatment, a noisy data reduction method and a robust text classifier from noisy data are developed as a solution. Finally, the term weighting and indexing technique is revised by reflecting the importance of sentences into term weight calculation using summarization techniques.

Energy Feature Normalization for Robust Speech Recognition in Noisy Environments

  • Lee, Yoon-Jae;Ko, Han-Seok
    • Speech Sciences
    • /
    • v.13 no.1
    • /
    • pp.129-139
    • /
    • 2006
  • In this paper, we propose two effective energy feature normalization methods for robust speech recognition in noisy environments. In the first method, we estimate the noise energy and remove it from the noisy speech energy. In the second method, we propose a modified algorithm for the Log-energy Dynamic Range Normalization (ERN) method. In the ERN method, the log energy of the training data in a clean environment is transformed into the log energy in noisy environments. If the minimum log energy of the test data is outside of a pre-defined range, the log energy of the test data is also transformed. Since the ERN method has several weaknesses, we propose a modified transform scheme designed to reduce the residual mismatch that it produces. In the evaluation conducted on the Aurora2.0 database, we obtained a significant performance improvement.

  • PDF

Ddenoising of a Positive Signal with White Gaussian Noise by Using Wavelet Transform

  • Koo, Ja-Yong
    • The Journal of the Acoustical Society of Korea
    • /
    • v.17 no.1E
    • /
    • pp.30-35
    • /
    • 1998
  • Given a noisy sampled at equispaced points with white noise, we consider problems where the signal to be recovered is known to be positive; for example, images, chemical spectra or other measurements of intensities. Shrinking noisy wavelet coefficients via thresholding offers very attractive alternatives to existing methods of recovering signals from noisy data. In this paper, we propose a method of recovering the original signal from a corrupted noisy signal, guaranteeing the recovered signal positive. We first obtain wavelet coefficients by thresholding, and use a nonlinear optimization to find the denoised signal which must be positive. Numerical examples are used to illustrate the performance of the proposed algorithm.

  • PDF

Frequency analysis of GPS data for structural health monitoring observations

  • Pehlivan, Huseyin
    • Structural Engineering and Mechanics
    • /
    • v.66 no.2
    • /
    • pp.185-193
    • /
    • 2018
  • In this study, low- and high-frequency structure behaviors were identified and a systematic analysis procedure was proposed using noisy GPS data from a 165-m-high tower in ${\dot{I}}stanbul$, Turkey. The raw GPS data contained long- and short-periodic position changes and noisy signals at different frequencies. To extract the significant results from this complex dataset, the general structure and components of the GPS signal were modeled and analyzed in the time and frequency domains. Uncontrolled jumps and deviations involving the signal in the time domain were pre-filtered. Then, the signal was converted to the frequency domain after applying low- and high-pass filters, and the frequency and periodic component values were calculated. The spectrum of the tower motion obtained from the filtered GPS data had dominant peaks at a low frequency of $1.15572{\times}10-4Hz$ and a high frequency of 0.16624 Hz, consistent with two equivalent GPS datasets. Then, the signal was reconstructed using inverse Fourier transform with the dominant low frequency values to obtain filtered and interpretable clean signals. With the proposed sequence, processing of noisy data collected from the GPS receivers mounted very close to the structure is effective in revealing the basic behaviors and features of buildings.

A Study on the Noisy Speech Recognition Based on the Data-Driven Model Parameter Compensation (직접데이터 기반의 모델적응 방식을 이용한 잡음음성인식에 관한 연구)

  • Chung, Yong-Joo
    • Speech Sciences
    • /
    • v.11 no.2
    • /
    • pp.247-257
    • /
    • 2004
  • There has been many research efforts to overcome the problems of speech recognition in the noisy conditions. Among them, the model-based compensation methods such as the parallel model combination (PMC) and vector Taylor series (VTS) have been found to perform efficiently compared with the previous speech enhancement methods or the feature-based approaches. In this paper, a data-driven model compensation approach that adapts the HMM(hidden Markv model) parameters for the noisy speech recognition is proposed. Instead of assuming some statistical approximations as in the conventional model-based methods such as the PMC, the statistics necessary for the HMM parameter adaptation is directly estimated by using the Baum-Welch algorithm. The proposed method has shown improved results compared with the PMC for the noisy speech recognition.

  • PDF

Genetic Programming Approach to Curve Fitting of Noisy Data and Its Application In Ship Design (유전적 프로그래밍을 이용한 노이지 데이터의 Curve Fitting과 선박설계에서의 적용)

  • Lee K. H.;Yeun Y S.
    • Korean Journal of Computational Design and Engineering
    • /
    • v.9 no.3
    • /
    • pp.183-191
    • /
    • 2004
  • This paper deals with smooth curve fitting of data corrupt by noise. Most research efforts have been concentrated on employing the smoothness penalty function with the estimation of its optimal parameter in order to avoid the 'overfilling and underfitting' dilemma in noisy data fitting problems. Our approach, called DBSF(Differentiation-Based Smooth Fitting), is different from the above-mentioned method. The main idea is that optimal functions approximately estimating the derivative of noisy curve data are generated first using genetic programming, and then their integral values are evaluated and used to recover the original curve form. To show the effectiveness of this approach, DBSP is demonstrated by presenting two illustrative examples and the application of estimating the principal dimensions of bulk cargo ships in the conceptual design stage.

Noisy Data Aggregation with Independent Sensors: Insights and Open Problems

  • Murayama, Tatsuto;Davis, Peter
    • Journal of Multimedia Information System
    • /
    • v.3 no.2
    • /
    • pp.21-26
    • /
    • 2016
  • Our networked world has been growing exponentially fast. The explosion in volume of machine-to-machine (M2M) transactions threatens to exceed the transport capacity of the networks that link them. Therefore, it is quite essential to reconsider the tradeoff between using many data sets versus using good data sets. We focus on this tradeoff in the context of the quality of information aggregated from many sensors in a noisy environment. We start with a basic theoretical model considered in the famous "CEO problem'' in the field of information theory. From a point of view of large deviations, we successfully find a simple statement for the optimal strategies under the limited network capacity condition. Moreover, we propose an open problem for a sensor network scenario and report a numerical result.

Robust Multidimensional Scaling for Multi-robot Localization (멀티로봇 위치 인식을 위한 강화 다차원 척도법)

  • Je, Hong-Mo;Kim, Dai-Jin
    • The Journal of Korea Robotics Society
    • /
    • v.3 no.2
    • /
    • pp.117-122
    • /
    • 2008
  • This paper presents a multi-robot localization based on multidimensional scaling (MDS) in spite of the existence of incomplete and noisy data. While the traditional algorithms for MDS work on the full-rank distance matrix, there might be many missing data in the real world due to occlusions. Moreover, it has no considerations to dealing with the uncertainty due to noisy observations. We propose a robust MDS to handle both the incomplete and noisy data, which is applied to solve the multi-robot localization problem. To deal with the incomplete data, we use the Nystr$\ddot{o}$m approximation which approximates the full distance matrix. To deal with the uncertainty, we formulate a Bayesian framework for MDS which finds the posterior of coordinates of objects by means of statistical inference. We not only verify the performance of MDS-based multi-robot localization by computer simulations, but also implement a real world localization of multi-robot team. Using extensive empirical results, we show that the accuracy of the proposed method is almost similar to that of Monte Carlo Localization(MCL).

  • PDF

A Nonparametric Approach for Noisy Point Data Preprocessing

  • Xi, Yongjian;Duan, Ye;Zhao, Hongkai
    • International Journal of CAD/CAM
    • /
    • v.9 no.1
    • /
    • pp.31-36
    • /
    • 2010
  • 3D point data acquired from laser scan or stereo vision can be quite noisy. A preprocessing step is often needed before a surface reconstruction algorithm can be applied. In this paper, we propose a nonparametric approach for noisy point data preprocessing. In particular, we proposed an anisotropic kernel based nonparametric density estimation method for outlier removal, and a hill-climbing line search approach for projecting data points onto the real surface boundary. Our approach is simple, robust and efficient. We demonstrate our method on both real and synthetic point datasets.

Detection and Correction of Noisy Pixels Embedded in NDVI Time Series Based on the Spatio-temporal Continuity (시공간적 연속성을 이용한 오염된 식생지수(GIMMS NDVI) 화소의 탐지 및 보정 기법 개발)

  • Park, Ju-Hee;Cho, A-Ra;Kang, Jeon-Ho;Suh, Myoung-Seok
    • Atmosphere
    • /
    • v.21 no.4
    • /
    • pp.337-347
    • /
    • 2011
  • In this paper, we developed a detection and correction method of noisy pixels embedded in the time series of normalized difference vegetation index (NDVI) data based on the spatio-temporal continuity of vegetation conditions. For the application of the method, 25-year (1982-2006) GIMMS (Global Inventory Modeling and Mapping Study) NDVI dataset over the Korean peninsula were used. The spatial resolution and temporal frequency of this dataset are $8{\times}8km^2$ and 15-day, respectively. Also the land cover map over East Asia is used. The noisy pixels are detected by the temporal continuity check with the reference values and dynamic threshold values according to season and location. In general, the number of noisy pixels are especially larger during summer than other seasons. And the detected noisy pixels are corrected by the iterative method until the noisy pixels are completely corrected. At first, the noisy pixels are replaced by the arithmetic weighted mean of two adjacent NDVIs when the two NDVI are normal. After that the remnant noisy pixels are corrected by the weighted average of NDVI of the same land cover according to the distance. After correction, the NDVI values and their variances are increased and decreased by 5% and 50%, respectively. Comparing to the other correction method, this correction method shows a better result especially when the noisy pixels are occurred more than 2 times consistently and the temporal change rates of NDVI are very high. It means that the correction method developed in this study is superior in the reconstruction of maximum NDVI and NDVI at the starting and falling season.