• 제목/요약/키워드: Incomplete Dataset

검색결과 19건 처리시간 0.022초

Fuzzy Classification Method for Processing Incomplete Dataset

  • Woo, Young-Woon;Lee, Kwang-Eui;Han, Soo-Whan
    • Journal of information and communication convergence engineering
    • /
    • 제8권4호
    • /
    • pp.383-386
    • /
    • 2010
  • Pattern classification is one of the most important topics for machine learning research fields. However incomplete data appear frequently in real world problems and also show low learning rate in classification models. There have been many researches for handling such incomplete data, but most of the researches are focusing on training stages. In this paper, we proposed two classification methods for incomplete data using triangular shaped fuzzy membership functions. In the proposed methods, missing data in incomplete feature vectors are inferred, learned and applied to the proposed classifier using triangular shaped fuzzy membership functions. In the experiment, we verified that the proposed methods show higher classification rate than a conventional method.

Compressive sensing-based two-dimensional scattering-center extraction for incomplete RCS data

  • Bae, Ji-Hoon;Kim, Kyung-Tae
    • ETRI Journal
    • /
    • 제42권6호
    • /
    • pp.815-826
    • /
    • 2020
  • We propose a two-dimensional (2D) scattering-center-extraction (SCE) method using sparse recovery based on the compressive-sensing theory, even with data missing from the received radar cross-section (RCS) dataset. First, using the proposed method, we generate a 2D grid via adaptive discretization that has a considerably smaller size than a fully sampled fine grid. Subsequently, the coarse estimation of 2D scattering centers is performed using both the method of iteratively reweighted least square and a general peak-finding algorithm. Finally, the fine estimation of 2D scattering centers is performed using the orthogonal matching pursuit (OMP) procedure from an adaptively sampled Fourier dictionary. The measured RCS data, as well as simulation data using the point-scatterer model, are used to evaluate the 2D SCE accuracy of the proposed method. The results indicate that the proposed method can achieve higher SCE accuracy for an incomplete RCS dataset with missing data than that achieved by the conventional OMP, basis pursuit, smoothed L0, and existing discrete spectral estimation techniques.

불완전 자료에 대한 Metropolis-Hastings Expectation Maximization 알고리즘 연구 (Metropolis-Hastings Expectation Maximization Algorithm for Incomplete Data)

  • 전수영;이희찬
    • 응용통계연구
    • /
    • 제25권1호
    • /
    • pp.183-196
    • /
    • 2012
  • 결측자료(missing data), 절단분포(truncated distribution), 중도절단자료(censored data) 등 불완전한 자료(incomplete data)하의 추론문제(incomplete problems)는 통계학에서 자주 발생되는 현상이다. 이런 문제의 해결방법으로 Expectation Maximization, Monte Carlo Expectation Maximization, Stochastic Expectation Maximization 알고리즘 등을 이용하는 방법이 있지만, 정형화된 분포의 가정이 필요하다는 단점을 가지고 있다. 본 연구에서는 정형화된 분포의 가정이 없는 경우에 사용할 수 있는 Metropolis-Hastings Expectation Maximization(MHEM) 알고리즘을 제안하고자 한다. MHEM 알고리즘의 효율성은 중도절단자료(censored data)를 이용한 모의실험과 KOSPI 200 수익률의 실증자료분석를 통해 알수 있었다.

Handling Incomplete Data Problem in Collaborative Filtering System

  • Noh, Hyun-ju;Kwak, Min-jung;Han, In-goo
    • 한국산학기술학회:학술대회논문집
    • /
    • 한국산학기술학회 2003년도 Proceeding
    • /
    • pp.105-110
    • /
    • 2003
  • Collaborative filtering is one of the methodologies that are most widely used for recommendation system. It is based on a data matrix of each customer's preferences of products. There could be a lot of missing values in such preference. data matrix. This incomplete data is one of the reasons to deteriorate the accuracy of recommendation system. Multiple imputation method imputes m values for each missing value. It overcomes flaws of single imputation approaches through considering the uncertainty of missing values.. The objective of this paper is to suggest multiple imputation-based collaborative filtering approach for recommendation system to improve the accuracy in prediction performance. The experimental works show that the proposed approach provides better performance than the traditional Collaborative filtering approach, especially in case that there are a lot of missing values in dataset used for recommendation system.

  • PDF

Handling Incomplete Data Problem in Collaborative Filtering System

  • Noh, Hyun-Ju;Kwak, Min-Jung;Han, In-Goo
    • 지능정보연구
    • /
    • 제9권2호
    • /
    • pp.51-63
    • /
    • 2003
  • Collaborative filtering is one of the methodologies that are most widely used for recommendation system. It is based on a data matrix of each customer's preferences of products. There could be a lot of missing values in such preference data matrix. This incomplete data is one of the reasons to deteriorate the accuracy of recommendation system. There are several treatments to deal with the incomplete data problem such as case deletion and single imputation. Those approaches are simple and easy to implement but they may provide biased results. Multiple imputation method imputes m values for each missing value. It overcomes flaws of single imputation approaches through considering the uncertainty of missing values. The objective of this paper is to suggest multiple imputation-based collaborative filtering approach for recommendation system to improve the accuracy in prediction performance. The experimental works show that the proposed approach provides better performance than the traditional Collaborative filtering approach, especially in case that there are a lot of missing values in dataset used for recommendation system.

  • PDF

Recovering Incomplete Data using Tucker Model for Tensor with Low-n-rank

  • Thieu, Thao Nguyen;Yang, Hyung-Jeong;Vu, Tien Duong;Kim, Sun-Hee
    • International Journal of Contents
    • /
    • 제12권3호
    • /
    • pp.22-28
    • /
    • 2016
  • Tensor with missing or incomplete values is a ubiquitous problem in various fields such as biomedical signal processing, image processing, and social network analysis. In this paper, we considered how to reconstruct a dataset with missing values by using tensor form which is called tensor completion process. We applied Tucker factorization to solve tensor completion which was built base on optimization problem. We formulated the optimization objective function using components of Tucker model after decomposing. The weighted least square matric contained only known values of the tensor with low rank in its modes. A first order optimization method, namely Nonlinear Conjugated Gradient, was applied to solve the optimization problem. We demonstrated the effectiveness of the proposed method in EEG signals with about 70% missing entries compared to other algorithms. The relative error was proposed to compare the difference between original tensor and the process output.

Incomplete Cholesky Decomposition based Kernel Cross Modal Factor Analysis for Audiovisual Continuous Dimensional Emotion Recognition

  • Li, Xia;Lu, Guanming;Yan, Jingjie;Li, Haibo;Zhang, Zhengyan;Sun, Ning;Xie, Shipeng
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제13권2호
    • /
    • pp.810-831
    • /
    • 2019
  • Recently, continuous dimensional emotion recognition from audiovisual clues has attracted increasing attention in both theory and in practice. The large amount of data involved in the recognition processing decreases the efficiency of most bimodal information fusion algorithms. A novel algorithm, namely the incomplete Cholesky decomposition based kernel cross factor analysis (ICDKCFA), is presented and employed for continuous dimensional audiovisual emotion recognition, in this paper. After the ICDKCFA feature transformation, two basic fusion strategies, namely feature-level fusion and decision-level fusion, are explored to combine the transformed visual and audio features for emotion recognition. Finally, extensive experiments are conducted to evaluate the ICDKCFA approach on the AVEC 2016 Multimodal Affect Recognition Sub-Challenge dataset. The experimental results show that the ICDKCFA method has a higher speed than the original kernel cross factor analysis with the comparable performance. Moreover, the ICDKCFA method achieves a better performance than other common information fusion methods, such as the Canonical correlation analysis, kernel canonical correlation analysis and cross-modal factor analysis based fusion methods.

EVALUATION OF AN ENHANCED WEATHER GENERATION TOOL FOR SAN ANTONIO CLIMATE STATION IN TEXAS

  • Lee, Ju-Young
    • Water Engineering Research
    • /
    • 제5권1호
    • /
    • pp.47-54
    • /
    • 2004
  • Several computer programs have been developed to make stochastically generated weather data from observed daily data. But they require fully dataset to run WGEN. Mostly, meterological data frequently have sporadic missing data as well as totally missing data. The modified WGEN has data filling algorithm for incomplete meterological datasets. Any other WGEN models have not the function of data filling. Modified WGEN with data filling algorithm is processing from the equation of Matalas for first order autoregressive process on a multi dimensional state with known cross and auto correlations among state variables. The parameters of the equation of Matalas are derived from existing dataset and derived parameters are adopted to fill data. In case of WGEN (Richardson and Wright, 1984), it is one of most widely used weather generators. But it has to be modified and added. It uses an exponential distribution to generate precipitation amounts. An exponential distribution is easier to describe the distribution of precipitation amounts. But precipitation data with using exponential distribution has not been expressed well. In this paper, generated precipitation data from WGEN and Modified WGEN were compared with corresponding measured data as statistic parameters. The modified WGEN adopted a formula of CLIGEN for WEPP (Water Erosion Prediction Project) in USDA in 1985. In this paper, the result of other parameters except precipitation is not introduced. It will be introduced through study of verification and review soon

  • PDF

Statistical Analysis of Bivariate Current Status Data with Informative Censoring Using Frailty Effects

  • Kim, Yang-Jin
    • 응용통계연구
    • /
    • 제25권1호
    • /
    • pp.115-123
    • /
    • 2012
  • In animal tumorigenicity data, tumor onsets occur at several sites and onset times cannot be exactly observed. Instead, the existence of tumors is examined only at death time or sacrifice time of the animal. Such an incomplete data structure makes it difficult to investigate the effect of treatment on tumor onset times; in addition, such dependence should be considered when censoring due to death is related with tumor onset. A bivariate frailty effect is incorporated to model bivariate tumor onsets and to connect death with tumor. For the inference of parameters, EM algorithm is applied and a real NTP(National Toxicology Program) dataset is analyzed as an illustrative example.

Improve object recognition using UWB SAR imaging with compressed sensing

  • Pham, The Hien;Hong, Ic-Pyo
    • 전기전자학회논문지
    • /
    • 제25권1호
    • /
    • pp.76-82
    • /
    • 2021
  • In this paper, the compressed sensing basic pursuit denoise algorithm adopted to synthetic aperture radar imaging is investigated to improve the object recognition. From the incomplete data sets for image processing, the compressed sensing algorithm had been integrated to recover the data before the conventional back- projection algorithm was involved to obtain the synthetic aperture radar images. This method can lead to the reduction of measurement events while scanning the objects. An ultra-wideband radar scheme using a stripmap synthetic aperture radar algorithm was utilized to detect objects hidden behind the box. The Ultra-Wideband radar system with 3.1~4.8 GHz broadband and UWB antenna were implemented to transmit and receive signal data of two conductive cylinders located inside the paper box. The results confirmed that the images can be reconstructed by using a 30% randomly selected dataset without noticeable distortion compared to the images generated by full data using the conventional back-projection algorithm.