• Title/Summary/Keyword: methods of data analysis

Search Result 19,233, Processing Time 0.047 seconds

Graphical Methods for the Sensitivity Analysis in Discriminant Analysis

  • Jang, Dae-Heung;Anderson-Cook, Christine M.;Kim, Youngil
    • Communications for Statistical Applications and Methods
    • /
    • v.22 no.5
    • /
    • pp.475-485
    • /
    • 2015
  • Similar to regression, many measures to detect influential data points in discriminant analysis have been developed. Many follow similar principles as the diagnostic measures used in linear regression in the context of discriminant analysis. Here we focus on the impact on the predicted classification posterior probability when a data point is omitted. The new method is intuitive and easily interpretable compared to existing methods. We also propose a graphical display to show the individual movement of the posterior probability of other data points when a specific data point is omitted. This enables the summaries to capture the overall pattern of the change.

Veri cation of Improving a Clustering Algorith for Microarray Data with Missing Values

  • Kim, Su-Young
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.2
    • /
    • pp.315-321
    • /
    • 2011
  • Gene expression microarray data often include multiple missing values. Most gene expression analysis (including gene clustering analysis); however, require a complete data matric as an input. In ordinary clustering methods, just a single missing value makes one abandon the whole data of a gene even if the rest of data for that gene was intact. The quality of analysis may decrease seriously as the missing rate is increased. In the opposite aspect, the imputation of missing value may result in an artifact that reduces the reliability of the analysis. To clarify this contradiction in microarray clustering analysis, this paper compared the accuracy of clustering with and without imputation over several microarray data having different missing rates. This paper also tested the clustering efficiency of several imputation methods including our propose algorithm. The results showed it is worthwhile to check the clustering result in this alternative way without any imputed data for the imperfect microarray data.

The Characteristics of Coastal Zone Management Methods in U.S.A -Focus on Zoning & Integrated Methods of Different Kind Data- (미국 연안구역(Coastal Zone) 관리수단의 특성 -조닝방식과 이종 데이터 간 통합방법을 중심으로-)

  • Oh, Ji-Hoon;Lee, Seok-Hwan;Lee, Hee-Won
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.11 no.9
    • /
    • pp.3590-3598
    • /
    • 2010
  • It is necessary to collect the coastal zone data, to prepare the objective analysis methods, and to build the scientific and technical support system for the efficient management of coastal zone in local aspect. This study analyzes the coastal zoning methods and the integrated methods of different kind data by case study of U.S.A coastal zone. The characteristics of coastal zone management methods in U.S.A are as follows; the concrete indices and methods of establishing coastal zone which can respond to the local values and land use, related data analysis methods for supporting spatial decision making, and establishment and administration of bureau for the spatial information construction and integration of coastal zone. This study suggest the technical implications which can build the domestic coastal zone management in local level on the basis of the common values of the coastal zoning methods and integrated methods of heterogeneous data in U.S.A.

Explanatory Analysis for South Korea's Political Website Linking - Statistical Aspects

  • Choi, Kyoung-Ho;Park, Han-Woo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.16 no.4
    • /
    • pp.899-911
    • /
    • 2005
  • This paper conducts an explanatory analysis of the web sphere produced by National Assemblymen in South Korea, using some statistical methods. First, some descriptive metrics were employed. Next, the traditional methods of multi-variate analyses, multidimensional scaling and corresponding analysis, were applied to the data. Finally, cross-sectional data were compared to examine a change over time.

  • PDF

A Methodology for Deriving An Object Model by Using Structured Analysis Results (구조적 분석 산출물을 이용한 객체 모델 유도 방법론)

  • 이희석;배한욱;유천수
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.21 no.3
    • /
    • pp.175-195
    • /
    • 1996
  • In conventional analysis methods, data and process are loosely coupled for building information systems. Several object oriented approaches have been proposed to integrate data and process. However, object oriented analysis requires a radical paradigm and thus system analysts find difficulties in generating object models direcctly from end users. To alleviate these difficulties, this paper proposes a methodology for deriving an object model by using structured analysis results. Objects are obtianed primarily from entities in Entity-Relationship Diagram. Methods are obtained through the analysis of the relationship between processes and data stores in Data Flow Diagram Methods are assigned to the objects by using object/process matrices. A real-life case is illustrated to demonstrate the usefulness of the methodology.

  • PDF

Exploring COVID-19 in mainland China during the lockdown of Wuhan via functional data analysis

  • Li, Xing;Zhang, Panpan;Feng, Qunqiang
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.1
    • /
    • pp.103-125
    • /
    • 2022
  • In this paper, we analyze the time series data of the case and death counts of COVID-19 that broke out in China in December, 2019. The study period is during the lockdown of Wuhan. We exploit functional data analysis methods to analyze the collected time series data. The analysis is divided into three parts. First, the functional principal component analysis is conducted to investigate the modes of variation. Second, we carry out the functional canonical correlation analysis to explore the relationship between confirmed and death cases. Finally, we utilize a clustering method based on the Expectation-Maximization (EM) algorithm to run the cluster analysis on the counts of confirmed cases, where the number of clusters is determined via a cross-validation approach. Besides, we compare the clustering results with some migration data available to the public.

Complex sample design effects and inference for Korea National Health and Nutrition Examination Survey data (국민건강영양조사 자료의 복합표본설계효과와 통계적 추론)

  • Chung, Chin-Eun
    • Journal of Nutrition and Health
    • /
    • v.45 no.6
    • /
    • pp.600-612
    • /
    • 2012
  • Nutritional researchers world-wide are using large-scale sample survey methods to study nutritional health epidemiology and services utilization in general, non-clinical populations. This article provides a review of important statistical methods and software that apply to descriptive and multivariate analysis of data collected in sample surveys, such as national health and nutrition examination survey. A comparative data analysis of the Korea National Health and Nutrition Examination Survey (KNHANES) was used to illustrate analytical procedures and design effects for survey estimates of population statistics, model parameters, and test statistics. This article focused on the following points, method of approach to analyze of the sample survey data, right software tools available to perform these analyses, and correct survey analysis methods important to interpretation of survey data. It addresses the question of approaches to analysis of complex sample survey data. The latest developments in software tools for analysis of complex sample survey data are covered, and empirical examples are presented that illustrate the impact of survey sample design effects on the parameter estimates, test statistics, and significance probabilities (p values) for univariate and multivariate analyses.

Applications of response dimension reduction in large p-small n problems

  • Minjee Kim;Jae Keun Yoo
    • Communications for Statistical Applications and Methods
    • /
    • v.31 no.2
    • /
    • pp.191-202
    • /
    • 2024
  • The goal of this paper is to show how multivariate regression analysis with high-dimensional responses is facilitated by the response dimension reduction. Multivariate regression, characterized by multi-dimensional response variables, is increasingly prevalent across diverse fields such as repeated measures, longitudinal studies, and functional data analysis. One of the key challenges in analyzing such data is managing the response dimensions, which can complicate the analysis due to an exponential increase in the number of parameters. Although response dimension reduction methods are developed, there is no practically useful illustration for various types of data such as so-called large p-small n data. This paper aims to fill this gap by showcasing how response dimension reduction can enhance the analysis of high-dimensional response data, thereby providing significant assistance to statistical practitioners and contributing to advancements in multiple scientific domains.

TRAPR: R Package for Statistical Analysis and Visualization of RNA-Seq Data

  • Lim, Jae Hyun;Lee, Soo Youn;Kim, Ju Han
    • Genomics & Informatics
    • /
    • v.15 no.1
    • /
    • pp.51-53
    • /
    • 2017
  • High-throughput transcriptome sequencing, also known as RNA sequencing (RNA-Seq), is a standard technology for measuring gene expression with unprecedented accuracy. Numerous bioconductor packages have been developed for the statistical analysis of RNA-Seq data. However, these tools focus on specific aspects of the data analysis pipeline, and are difficult to appropriately integrate with one another due to their disparate data structures and processing methods. They also lack visualization methods to confirm the integrity of the data and the process. In this paper, we propose an R-based RNA-Seq analysis pipeline called TRAPR, an integrated tool that facilitates the statistical analysis and visualization of RNA-Seq expression data. TRAPR provides various functions for data management, the filtering of low-quality data, normalization, transformation, statistical analysis, data visualization, and result visualization that allow researchers to build customized analysis pipelines.

Survival Analysis of Gastric Cancer Patients with Incomplete Data

  • Moghimbeigi, Abbas;Tapak, Lily;Roshanaei, Ghodaratolla;Mahjub, Hossein
    • Journal of Gastric Cancer
    • /
    • v.14 no.4
    • /
    • pp.259-265
    • /
    • 2014
  • Purpose: Survival analysis of gastric cancer patients requires knowledge about factors that affect survival time. This paper attempted to analyze the survival of patients with incomplete registered data by using imputation methods. Materials and Methods: Three missing data imputation methods, including regression, expectation maximization algorithm, and multiple imputation (MI) using Monte Carlo Markov Chain methods, were applied to the data of cancer patients referred to the cancer institute at Imam Khomeini Hospital in Tehran in 2003 to 2008. The data included demographic variables, survival times, and censored variable of 471 patients with gastric cancer. After using imputation methods to account for missing covariate data, the data were analyzed using a Cox regression model and the results were compared. Results: The mean patient survival time after diagnosis was $49.1{\pm}4.4$ months. In the complete case analysis, which used information from 100 of the 471 patients, very wide and uninformative confidence intervals were obtained for the chemotherapy and surgery hazard ratios (HRs). However, after imputation, the maximum confidence interval widths for the chemotherapy and surgery HRs were 8.470 and 0.806, respectively. The minimum width corresponded with MI. Furthermore, the minimum Bayesian and Akaike information criteria values correlated with MI (-821.236 and -827.866, respectively). Conclusions: Missing value imputation increased the estimate precision and accuracy. In addition, MI yielded better results when compared with the expectation maximization algorithm and regression simple imputation methods.