• Title/Summary/Keyword: Multivariate Statistical Analysis

Search Result 632, Processing Time 0.03 seconds

Improving data reliability on oligonucleotide microarray

  • Yoon, Yeo-In;Lee, Young-Hak;Park, Jin-Hyun
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2004.11a
    • /
    • pp.107-116
    • /
    • 2004
  • The advent of microarray technologies gives an opportunity to moni tor the expression of ten thousands of genes, simultaneously. Such microarray data can be deteriorated by experimental errors and image artifacts, which generate non-negligible outliers that are estimated by 15% of typical microarray data. Thus, it is an important issue to detect and correct the se faulty probes prior to high-level data analysis such as classification or clustering. In this paper, we propose a systematic procedure for the detection of faulty probes and its proper correction in Genechip array based on multivariate statistical approaches. Principal component analysis (PCA), one of the most widely used multivariate statistical approaches, has been applied to construct a statistical correlation model with 20 pairs of probes for each gene. And, the faulty probes are identified by inspecting the squared prediction error (SPE) of each probe from the PCA model. Then, the outlying probes are reconstructed by the iterative optimization approach minimizing SPE. We used the public data presented from the gene chip project of human fibroblast cell. Through the application study, the proposed approach showed good performance for probe correction without removing faulty probes, which may be desirable in the viewpoint of the maximum use of data information.

  • PDF

Comparison of Parameter Estimation Methods in the Analysis of Multivariate Categorical Data with Logit Models

  • Song, Hae-Hiang
    • Journal of the Korean Statistical Society
    • /
    • v.12 no.1
    • /
    • pp.24-35
    • /
    • 1983
  • In fitting models to data, selection of the most desirable estimation method and determination of the adequacy of fitted model are the central issues. This paper compares the maximum likelihood estimators and the minimum logit chi-square estimators, both being best asymptotically normal, when logit models are fitted to infant mortality data. Chi-square goodness-of-fit test and likelihood ratio one are also compared. The analysis infant mortality data shows that the outlying observations do not necessarily result in the same impact on goodness-of-fit measures.

  • PDF

The Evaluation of Water Quality Using a Multivariate Analysis in Changnyeong-Haman weir section (다변량 통계분석을 이용한 낙동강 창녕함안보 구간의 수질 특성 평가)

  • Gwak, Bo-ra;Kim, Il-kyu
    • Journal of Korean Society of Water and Wastewater
    • /
    • v.29 no.6
    • /
    • pp.625-632
    • /
    • 2015
  • The study of water environment system using a multivariate analysis in Changnyeong-Haman weir section has been conducted. The purpose of this study is to establish better understanding related water qualities in the Changnyeong-Haman weir section which can provide useful information. The data were consisted of water quality data and algae data including WT(water temperature), pH, DO, EC, COD, SS, T-N, $NH_3-N$, T-P, $PO_4-P$, Chl-a, TOC, d-silica, t-silica, Cyanobacteria, Diatoms, and Green algae. Statistical analyses used in this study were correlation analysis, principal components, and factor analysis. According to correlation analysis on COD and TOC, it revealed that the each value of correlation coefficient was 0.843. On the other result, a negative correlation was observed between diatoms and d-silica. Furthermore, the results of principal component analysis to the overall water quality were classified into four main factors with contribution rate 81.071%.

통계분석을 이용한 지하수위 변동 특성 분류

  • 문상기;우남칠
    • Proceedings of the Korean Society of Soil and Groundwater Environment Conference
    • /
    • 2001.09a
    • /
    • pp.155-159
    • /
    • 2001
  • A study on multivariate statistical classification of ground water hydrographs was conducted. The vast data of national ground water monitoring network (78 sites of alluvium) were used. 6 factors were selected to classify the ground water level change. Factor analysis was proved to be useful tool for classifying vast hydrogeological data.

  • PDF

Classification of Forest Cover Types in the Baekdudaegan, South Korea

  • Chung, Sang Hoon;Lee, Sang Tae
    • Journal of Forest and Environmental Science
    • /
    • v.37 no.4
    • /
    • pp.269-279
    • /
    • 2021
  • This study was carried out to introduce the forest cover types of the Baekdudaegan inhabiting the number of native tree species. In order to understand the vegetation distribution characteristics of the Baekdudaegan, a vegetation survey was conducted on the major 20 mountains of the Baekdudaegan. The vegetation data were collected from 3,959 sample points by the point-centered quarter method. Each mountain was classified into 4-7 forests by using various multivariate statistical methods such as cluster analysis, indicator species analysis, multiple discriminant analysis, and species composition analysis. The forests were classified mainly according to the relative abundance of Quercus mongolica. There was a total of 111 classified forests and these forests were integrated into the following nine forest cover types using the percentage similarity index and by clustering according to vegetation type: 1) Mongolian oak, 2) Mongolian oak and other deciduous, 3) Oaks (Mixed Quercus spp.), 4) Korean red pine, 5) Korean red pine and oaks, 6) ash, 7) mixed mesophytic, 8) subalpine zone coniferous, and 9) miscellaneous forest. Forests grouped within the subalpine zone coniferous and miscellaneous classifications were characterized by similar environmental conditions and those forests that did not fit in any other category, respectively.

A Study on High Breakdown Discriminant Analysis : A Monte Carlo Simulation

  • Moon Sup;Young Joo;Youngjo
    • Communications for Statistical Applications and Methods
    • /
    • v.7 no.1
    • /
    • pp.225-232
    • /
    • 2000
  • The linear and quadratic discrimination functions based on normal theory are widely used to classify an observation to one of predefined groups. But the discriminant functions are sensitive to outliers. A high breakdown procedure to estimate location and scatter of multivariate data is the minimum volume ellipsoid or MVE estimator To obtain high breakdown classifiers outliers in multivariate data are detected by using the robust Mahalanobis distance based on MVE estimators and the weighted estimators are inserted in the functions for classification. A samll-sample MOnte Carlo study shows that the high breakdown robust procedures perform better than the classical classifiers.

  • PDF

Detecting cell cycle-regulated genes using Self-Organizing Maps with statistical Phase Synchronization (SOMPS) algorithm

  • Kim, Chang Sik;Tcha, Hong Joon;Bae, Cheol-Soo;Kim, Moon-Hwan
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.1 no.2
    • /
    • pp.39-50
    • /
    • 2008
  • Developing computational methods for identifying cell cycle-regulated genes has been one of important topics in systems biology. Most of previous methods consider the periodic characteristics of expression signals to identify the cell cycle-regulated genes. However, we assume that cell cycle-regulated genes are relatively active having relatively many interactions with each other based on the underlying cellular network. Thus, we are motivated to apply the theory of multivariate phase synchronization to the cell cycle expression analysis. In this study, we apply the method known as "Self-Organizing Maps with statistical Phase Synchronization (SOMPS)", which is the combination of self-organizing map and multivariate phase synchronization, producing several subsets of genes that are expected to have interactions with each other in their subset (Kim, 2008). Our evaluation experiments show that the SOMPS algorithm is able to detect cell cycle-regulated genes as much as one of recently reported method that performs better than most existing methods.

  • PDF

Applications of Cluster Analysis in Biplots (행렬도에서 군집분석의 활용)

  • Choi, Yong-Seok;Kim, Hyoung-Young
    • Communications for Statistical Applications and Methods
    • /
    • v.15 no.1
    • /
    • pp.65-76
    • /
    • 2008
  • Biplots are the multivariate analogue of scatter plots. They approximate the multivariate distribution of a sample in a few dimensions, typically two, and they superimpose on this display representations of the variables on which the samples are measured(Gower and Hand, 1996, Chapter 1). And the relationships between the observations and variables can be easily seen. Thus, biplots are useful for giving a graphical description of the data. However, this method does not give some concise interpretations between variables and observations when the number of observations are large. Therefore, in this study, we will suggest to interpret the biplot analysis by applying the K-means clustering analysis. It shows that the relationships between the clusters and variables can be easily interpreted. So, this method is more useful for giving a graphical description of the data than using raw data.

Application of Multivariate Statistical Techniques to Analyze the Pollution Characteristics of Major Tributaries of the Nakdong River (낙동강 주요 지류의 오염특성 분석을 위한 다변량 통계기법의 적용)

  • Park, Jaebeom;Kal, Byungseok;Kim, Seongmin
    • Journal of Wetlands Research
    • /
    • v.21 no.3
    • /
    • pp.215-223
    • /
    • 2019
  • In this study, we analyzed the water quality characteristics of major tributaries of Nakdong River through statistical analysis such as correlation analysis, principal component and factor analysis, and cluster analysis. Organic matter and nutrients are highly correlated, and are high in spring and autumn, and seasonal water quality management is required. Principal component and factor analysis showed that 82% of total variance could be explained by 4 principal components such as organic matter, nutrients, nature, and weather. BOD, COD, TOC, and TP items were analyzed as major influencing factors. As a result of the cluster analysis, the four clusters were classified according to seasonal organic matter and nutrient pollution. Kumho River watershed showed high pollution characteristics in all seasons. Therefore, effective management of water quality in tributary streams requires measures in consideration of spatio-temporal characteristics and multivariate statistical techniques may be useful in water quality management and policy formulation.

Application of metabolic profiling for biomarker discovery

  • Hwang, Geum-Sook
    • Proceedings of the Korean Society of Applied Pharmacology
    • /
    • 2007.11a
    • /
    • pp.19-27
    • /
    • 2007
  • An important potential of metabolomics-based approach is the possibility to develop fingerprints of diseases or cellular responses to classes of compounds with known common biological effect. Such fingerprints have the potential to allow classification of disease states or compounds, to provide mechanistic information on cellular perturbations and pathways and to identify biomarkers specific for disease severity and drug efficacy. Metabolic profiles of biological fluids contain a vast array of endogenous metabolites. Changes in those profiles resulting from perturbations of the system can be observed using analytical techniques, such as NMR and MS. $^1H$ NMR was used to generate a molecular fingerprint of serum or urinary sample, and then pattern recognition technique was applied to identity molecular signatures associated with the specific diseases or drug efficiency. Several metabolites that differentiate disease samples from the control were thoroughly characterized by NMR spectroscopy. We investigated the metabolic changes in human normal and clinical samples using $^1H$ NMR. Spectral data were applied to targeted profiling and spectral binning method, and then multivariate statistical data analysis (MVDA) was used to examine in detail the modulation of small molecule candidate biomarkers. We show that targeted profiling produces robust models, generates accurate metabolite concentration data, and provides data that can be used to help understand metabolic differences between healthy and disease population. Such metabolic signatures could provide diagnostic markers for a disease state or biomarkers for drug response phenotypes.

  • PDF