Search | Korea Science

Cluster Analysis of Incomplete Microarray Data with Fuzzy Clustering

Kim, Dae-Won
- Journal of the Korean Institute of Intelligent Systems
- /
- v.17 no.3
- /
- pp.397-402
- /
- 2007
In this paper, we present a method for clustering incomplete Microarray data using alternating optimization in which a prior imputation method is not required. To reduce the influence of imputation in preprocessing, we take an alternative optimization approach to find better estimates during iterative clustering process. This method improves the estimates of missing values by exploiting the cluster Information such as cluster centroids and all available non-missing values in each iteration. The clustering results of the proposed method are more significantly relevant to the biological gene annotations than those of other methods, indicating its effectiveness and potential for clustering incomplete gene expression data.
https://doi.org/10.5391/JKIIS.2007.17.3.397 인용 PDF KSCI

Pairwise fusion approach to cluster analysis with applications to movie data (영화 데이터를 위한 쌍별 규합 접근방식의 군집화 기법)

Kim, Hui Jin;Park, Seyoung
- The Korean Journal of Applied Statistics
- /
- v.35 no.2
- /
- pp.265-283
- /
- 2022
MovieLens data consists of recorded movie evaluations that was often used to measure the evaluation score in the recommendation system research field. In this paper, we provide additional information obtained by clustering user-specific genre preference information through movie evaluation data and movie genre data. Because the number of movie ratings per user is very low compared to the total number of movies, the missing rate in this data is very high. For this reason, there are limitations in applying the existing clustering methods. In this paper, we propose a convex clustering-based method using the pairwise fused penalty motivated by the analysis of MovieLens data. In particular, the proposed clustering method execute missing imputation, and at the same time uses movie evaluation and genre weights for each movie to cluster genre preference information possessed by each individual. We compute the proposed optimization using alternating direction method of multipliers algorithm. It is shown that the proposed clustering method is less sensitive to noise and outliers than the existing method through simulation and MovieLens data application.
https://doi.org/10.5351/KJAS.2022.35.2.265 인용 PDF KSCI

Estimation of Survival Function and Median Survival Time in Interval-Censored Data (구간중도절단자료에서 생존함수와 중간생존시간에 대한 추정)

Yun, Eun-Young;Kim, Choong-Rak
- The Korean Journal of Applied Statistics
- /
- v.23 no.3
- /
- pp.521-531
- /
- 2010
Interval-censored observations are common in medical and epidemiologic studies; however, limited studies exist due to the complexity and special structure of interval-censoring. This paper introduces the imputation method and the self consistency method in the interval-censored data. We propose a new method of generating random numbers under an interval-censoring set-up. Through simulation studies we compare two methods under various simulation schemes in the sense of the mean squared error for estimating the median survival time and the mean integrated squared error for estimating the survival function. Under a moderate censoring percentage, the mean imputation method showed a better performance than the self-consistency method in estimating the median survival time and the survival function.
https://doi.org/10.5351/KJAS.2010.23.3.521 인용 PDF KSCI

Missing values imputation for time course gene expression data using the pattern consistency index adaptive nearest neighbors (시간경로 유전자 발현자료에서 패턴일치지수와 적응 최근접 이웃을 활용한 결측값 대치법)

Shin, Heyseo;Kim, Dongjae
- The Korean Journal of Applied Statistics
- /
- v.33 no.3
- /
- pp.269-280
- /
- 2020
Time course gene expression data is a large amount of data observed over time in microarray experiments. This data can also simultaneously identify the level of gene expression. However, the experiment process is complex, resulting in frequent missing values due to various causes. In this paper, we propose a pattern consistency index adaptive nearest neighbors as a method of missing value imputation. This method combines the adaptive nearest neighbors (ANN) method that reflects local characteristics and the pattern consistency index that considers consistent degree for gene expression between observations over time points. We conducted a Monte Carlo simulation study to evaluate the usefulness of proposed the pattern consistency index adaptive nearest neighbors (PANN) method for two yeast time course data.
https://doi.org/10.5351/KJAS.2020.33.3.269 인용 PDF KSCI

A Concordance Study of the Preprocessing Orders in Microarray Data (마이크로어레이 자료의 사전 처리 순서에 따른 검색의 일치도 분석)

Kim, Sang-Cheol;Lee, Jae-Hwi;Kim, Byung-Soo
- The Korean Journal of Applied Statistics
- /
- v.22 no.3
- /
- pp.585-594
- /
- 2009
Researchers of microarray experiment transpose processed images of raw data to possible data of statistical analysis: it is preprocessing. Preprocessing of microarray has image filtering, imputation and normalization. There have been studied about several different methods of normalization and imputation, but there was not further study on the order of the procedures. We have no further study about which things put first on our procedure between normalization and imputation. This study is about the identification of differentially expressed genes(DEG) on the order of the preprocessing steps using two-dye cDNA microarray in colon cancer and gastric cancer. That is, we check for compare which combination of imputation and normalization steps can detect the DEG. We used imputation methods(K-nearly neighbor, Baysian principle comparison analysis) and normalization methods(global, within-print tip group, variance stabilization). Therefore, preprocessing steps have 12 methods. We identified concordance measure of DEG using the datasets to which the 12 different preprocessing orders were applied. When we applied preprocessing using variance stabilization of normalization method, there was a little variance in a sensitive way for detecting DEG.
https://doi.org/10.5351/KJAS.2009.22.3.585 인용 PDF KSCI

A modified estimating equation for a binary time varying covariate with an interval censored changing time

Kim, Yang-Jin
- Communications for Statistical Applications and Methods
- /
- v.23 no.4
- /
- pp.335-341
- /
- 2016
Interval censored failure time data often occurs in an observational study where a subject is followed periodically. Instead of observing an exact failure time, two inspection times that include it are made available. Several methods have been suggested to analyze interval censored failure time data (Sun, 2006). In this article, we are concerned with a binary time-varying covariate whose changing time is interval censored. A modified estimating equation is proposed by extending the approach suggested in the presence of a missing covariate. Based on simulation results, the proposed method shows a better performance than other simple imputation methods. ACTG 181 dataset were analyzed as a real example.
https://doi.org/10.5351/CSAM.2016.23.4.335 인용 PDF KSCI

A Generation and Accuracy Evaluation of Common Metadata Prediction Model Using Public Bicycle Data and Imputation Method

Kim, Jong-Chan;Jung, Se-Hoon
- Journal of Korea Multimedia Society
- /
- v.25 no.2
- /
- pp.287-296
- /
- 2022
Today, air pollution is becoming a severe issue worldwide and various policies are being implemented to solve environmental pollution. In major cities, public bicycles are installed and operated to reduce pollution and solve transportation problems, and operational information is collected in real time. However, research using public bicycle operation information data has not been processed. This study uses the daily weather data of Korea Meteorological Agency and real-time air pollution data of Korea Environment Corporation to predict the amount of daily rental bicycles. Cross- validation, principal component analysis and multiple regression analysis were used to determine the independent variables of the predictive model. Then, the study selected the elements that satisfy the significance level, constructed a model, predicted the amount of daily rental bicycles, and measured the accuracy.
https://doi.org/10.9717/kmms.2022.25.2.287 인용 PDF KSCI

An EM Algorithm-Based Approach for Imputation of Pixel Values in Color Image (색조영상에서 랜덤결측화소값 대체를 위한 EM 알고리즘 기반 기법)

Kim, Seung-Gu
- The Korean Journal of Applied Statistics
- /
- v.23 no.2
- /
- pp.305-315
- /
- 2010
In this paper, a frequentistic approach to impute the values of R, G, B-components in random missing pixels of color image is provided. Under assumption that the given image is a realization of Gaussian Markov random field, its model is designed such that each neighbor pixel values for a given pixel follows (independently) the normal distribution with covariance matrix scaled by an evaluates of the similarity between two pixel values, so that the imputation is not to be affected by the neighbors with different color. An approximate EM-based algorithm maximizing the underlying likelihood is implemented to estimate the parameters and to impute the missing pixel values. Some experiments are presented to show its effectiveness through performance comparison with a popular interpolation method.
https://doi.org/10.5351/KJAS.2010.23.2.305 인용 PDF KSCI

Sparse Web Data Analysis Using MCMC Missing Value Imputation and PCA Plot-based SOM (MCMC 결측치 대체와 주성분 산점도 기반의 SOM을 이용한 희소한 웹 데이터 분석)

Jun, Sung-Hae;Oh, Kyung-Whan
- The KIPS Transactions:PartD
- /
- v.10D no.2
- /
- pp.277-282
- /
- 2003
The knowledge discovery from web has been studied in many researches. There are some difficulties using web log for training data on efficient information predictive models. In this paper, we studied on the method to eliminate sparseness from web log data and to perform web user clustering. Using missing value imputation by Bayesian inference of MCMC, the sparseness of web data is removed. And web user clustering is performed using self organizing maps based on 3-D plot by principal component. Finally, using KDD Cup data, our experimental results were shown the problem solving process and the performance evaluation.
https://doi.org/10.3745/KIPSTD.2003.10D.2.277 인용 PDF KSCI

A Novel on Auto Imputation and Analysis Prediction Model of Data Missing Scope based on Machine Learning (머신러닝기반의 데이터 결측 구간의 자동 보정 및 분석 예측 모델에 대한 연구)

Jung, Se-Hoon;Lee, Han-Sung;Kim, Jun-Yeong;Sim, Chun-Bo
- Journal of Korea Multimedia Society
- /
- v.25 no.2
- /
- pp.257-268
- /
- 2022
When there is a missing value in the raw data, if ignore the missing values and proceed with the analysis, the accuracy decrease due to the decrease in the number of sample. The method of imputation and analyzing patterns and significant values can compensate for the problem of lower analysis quality and analysis accuracy as a result of bias rather than simply removing missing values. In this study, we proposed to study irregular data patterns and missing processing methods of data using machine learning techniques for the study of correction of missing values. we would like to propose a plan to replace the missing with data from a similar past point in time by finding the situation at the time when the missing data occurred. Unlike previous studies, data correction techniques present new algorithms using DNN and KNN-MLE techniques. As a result of the performance evaluation, the ANAE measurement value compared to the existing missing section correction algorithm confirmed a performance improvement of about 0.041 to 0.321.
https://doi.org/10.9717/kmms.2022.25.2.257 인용 PDF KSCI HTML

Search Result 132, Processing Time 0.021 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)