• Title/Summary/Keyword: a mixed data set

Search Result 139, Processing Time 0.025 seconds

Detecting differentially expressed genes from a mixed data set

  • Lee, Sun-Ho;Kim, In-Young;Kim, Sang-Cheol;Rha, Sun-Young;Chung, Hyun-Chel;Kim, Byung-Soo
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2003.10a
    • /
    • pp.173-177
    • /
    • 2003
  • When we have both a paired data set and two independent data sets, neither a paired t-test nor a two-sample t-test can be used to detect differences between two samples. In order to identify differentially expressed genes in a mixed data set, a new test statistic is proposed.

  • PDF

Statistical Method of Ranking Candidate Genes for the Biomarker

  • Kim, Byung-Soo;Kim, In-Young;Lee, Sun-Ho;Rha, Sun-Young
    • Communications for Statistical Applications and Methods
    • /
    • v.14 no.1
    • /
    • pp.169-182
    • /
    • 2007
  • Receive operating characteristic (ROC) approach can be employed to rank candidate genes from a microarray experiment, in particular, for the biomarker development with the purpose of population screening of a cancer. In the cancer microarray experiment based on n patients the researcher often wants to compare the tumor tissue with the normal tissue within the same individual using a common reference RNA. Ideally, this experiment produces n pairs of microarray data. However, it is often the case that there are missing values either in the normal or tumor tissue data. Practically, we have $n_1$ pairs of complete observations, $n_2$ "normal only" and $n_3$ "tumor only" data for the microarray. We refer to this data set as a mixed data set. We develop a ROC approach on the mixed data set to rank candidate genes for the biomarker development for the colorectal cancer screening. It turns out that the correlation between two ranks in terms of ROC and t statistics based on the top 50 genes of ROC rank is less than 0.6. This result indicates that employing a right approach of ranking candidate genes for the biomarker development is important for the allocation of resources.

Ranking Candidate Genes for the Biomarker Development in a Cancer Diagnostics

  • Kim, In-Young;Lee, Sun-Ho;Rha, Sun-Young;Kim, Byung-Soo
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2004.11a
    • /
    • pp.272-278
    • /
    • 2004
  • Recently, Pepe et al. (2003) employed the receiver operating characteristic (ROC) approach to rank candidate genes from a microarray experiment that can be used for the biomarker development with the ultimate purpose of the population screening of a cancer, In the cancer microarray experiment based on n patients the researcher often wants to compare the tumor tissue with the normal tissue within the same individual using a common reference RNA. This design is referred to as a reference design or an indirect design. Ideally, this experiment produces n pairs of microarray data, where each pair consists of two sets of microarray data resulting from reference versus normal tissue and reference versus tumor tissue hybridizations. However, for certain individuals either normal tissue or tumor tissue is not large enough for the experimenter to extract enough RNA for conducting the microarray experiment, hence there are missing values either in the normal or tumor tissue data. Practically, we have $n_1$ pairs of complete observations, $n_2$ 'normal only' and $n_3$ 'tumor only' data for the microarray experiment with n patients, where n=$n_1$+$n_2$+$n_3$. We refer to this data set as a mixed data set, as it contains a mix of fully observed and partially observed pair data. This mixed data set was actually observed in the microarray experiment based on human tissues, where human tissues were obtained during the surgical operations of cancer patients. Pepe et al. (2003) provide the rationale of using ROC approach based on two independent samples for ranking candidate gene instead of using t or Mann -Whitney statistics. We first modify ROC approach of ranking genes to a paired data set and further extend it to a mixed data set by taking a weighted average of two ROC values obtained by the paired data set and two independent data sets.

  • PDF

A General Mixed Linear Model with Left-Censored Data

  • Ha, Il-Do
    • Communications for Statistical Applications and Methods
    • /
    • v.15 no.6
    • /
    • pp.969-976
    • /
    • 2008
  • Mixed linear models have been widely used in various correlated data including multivariate survival data. In this paper we extend hierarchical-likelihood(h-likelihood) approach for mixed linear models with right censored data to that for left censored data. We also allow a general random-effect structure and propose the estimation procedure. The proposed method is illustrated using a numerical data set and is also compared with marginal likelihood method.

A Mixed Model for Oredered Response Categories

  • Choi, Jae-Sung
    • Journal of the Korean Data and Information Science Society
    • /
    • v.15 no.2
    • /
    • pp.339-345
    • /
    • 2004
  • This paper deals with a mixed logit model for ordered polytomous data. There are two types of factors affecting the response varable in this paper. One is a fixed factor with finite quantitative levels and the other is a random factor coming from an experimental structure such as a randomized complete block design. It is discussed how to set up the model for analyzing ordered polytomous data and illustrated how to estimate the paramers in the given model.

  • PDF

Cluster Analysis with Balancing Weight on Mixed-type Data

  • Chae, Seong-San;Kim, Jong-Min;Yang, Wan-Youn
    • Communications for Statistical Applications and Methods
    • /
    • v.13 no.3
    • /
    • pp.719-732
    • /
    • 2006
  • A set of clustering algorithms with proper weight on the formulation of distance which extend to mixed numeric and multiple binary values is presented. A simple matching and Jaccard coefficients are used to measure similarity between objects for multiple binary attributes. Similarities are converted to dissimilarities between i th and j th objects. The performance of clustering algorithms with balancing weight on different similarity measures is demonstrated. Our experiments show that clustering algorithms with application of proper weight give competitive recovery level when a set of data with mixed numeric and multiple binary attributes is clustered.

AN APPROACH TO THE TRAINING OF A SUPPORT VECTOR MACHINE (SVM) CLASSIFIER USING SMALL MIXED PIXELS

  • Yu, Byeong-Hyeok;Chi, Kwang-Hoon
    • Proceedings of the KSRS Conference
    • /
    • 2008.10a
    • /
    • pp.386-389
    • /
    • 2008
  • It is important that the training stage of a supervised classification is designed to provide the spectral information. On the design of the training stage of a classification typically calls for the use of a large sample of randomly selected pure pixels in order to characterize the classes. Such guidance is generally made without regard to the specific nature of the application in-hand, including the classifier to be used. An approach to the training of a support vector machine (SVM) classifier that is the opposite of that generally promoted for training set design is suggested. This approach uses a small sample of mixed spectral responses drawn from purposefully selected locations (geographical boundaries) in training. A sample of such data should, however, be easier and cheaper to acquire than that suggested by traditional approaches. In this research, we evaluated them against traditional approaches with high-resolution satellite data. The results proved that it can be used small mixed pixels to derive a classification with similar accuracy using a large number of pure pixels. The approach can also reduce substantial costs in training data acquisition because the sampling locations used are commonly easy to observe.

  • PDF

Cointegration Analysis with Mixed-Frequency Data of Quarterly GDP and Monthly Coincident Indicators

  • Seong, Byeongchan
    • The Korean Journal of Applied Statistics
    • /
    • v.25 no.6
    • /
    • pp.925-932
    • /
    • 2012
  • The article introduces a method to estimate a cointegrated vector autoregressive model, using mixed-frequency data, in terms of a state-space representation of the vector error correction(VECM) of the model. The method directly estimates the parameters of the model, in a state-space form of its VECM representation, using the available data in its mixed-frequency form. Then it allows one to compute in-sample smoothed estimates and out-of-sample forecasts at their high-frequency intervals using the estimated model. The method is applied to a mixed-frequency data set that consists of the quarterly real gross domestic product and three monthly coincident indicators. The result shows that the method produces accurate smoothed and forecasted estimates in comparison to a method based on single-frequency data.

Dam Sensor Outlier Detection using Mixed Prediction Model and Supervised Learning

  • Park, Chang-Mok
    • International journal of advanced smart convergence
    • /
    • v.7 no.1
    • /
    • pp.24-32
    • /
    • 2018
  • An outlier detection method using mixed prediction model has been described in this paper. The mixed prediction model consists of time-series model and regression model. The parameter estimation of the prediction model was performed using supervised learning and a genetic algorithm is adopted for a learning method. The experiments were performed in artificial and real data set. The prediction performance is compared with the existing prediction methods using artificial data. Outlier detection is conducted using the real sensor measurements in a dam. The validity of the proposed method was shown in the experiments.

Performance analysis of mixed-flow fans considering the low flow characteristics (저유량 특성을 고려한 사류 송풍기의 성능 해석)

  • Oh, Hyoung Woo;Kim, Kwang-Yong
    • 유체기계공업학회:학술대회논문집
    • /
    • 2000.12a
    • /
    • pp.110-115
    • /
    • 2000
  • The mean streamline analysis using the empirical loss correlations has been developed for performance prediction of industrial mixed-flow fan impellers in the present study. New simple, but effective, models for the additional Euler input work characteristic and an internal recirculation loss due to internal flow reversal under the low flowrate conditions are proposed in this paper. Comparison of overall performance predictions with six sets of test data of mixed-flow fans is accomplished to demonstrate the accuracy of the proposed models. Predicted performance curves by the present set of loss models agree fairly well with experimental data for a variety of mixed-flow fan impellers over the entire operating conditions. The prediction method presented herein can be used efficiently in the conceptual design phase of mixed-flow fan impellers.

  • PDF