• Title/Summary/Keyword: Multivariate statistical models

검색결과 126건 처리시간 0.024초

Issues Related to the Use of Time Series in Model Building and Analysis: Review Article

  • Wei, William W.S.
    • Communications for Statistical Applications and Methods
    • /
    • 제22권3호
    • /
    • pp.209-222
    • /
    • 2015
  • Time series are used in many studies for model building and analysis. We must be very careful to understand the kind of time series data used in the analysis. In this review article, we will begin with some issues related to the use of aggregate and systematic sampling time series. Since several time series are often used in a study of the relationship of variables, we will also consider vector time series modeling and analysis. Although the basic procedures of model building between univariate time series and vector time series are the same, there are some important phenomena which are unique to vector time series. Therefore, we will also discuss some issues related to vector time models. Understanding these issues is important when we use time series data in modeling and analysis, regardless of whether it is a univariate or multivariate time series.

Multiple Testing in Genomic Sequences Using Hamming Distance

  • Kang, Moonsu
    • Communications for Statistical Applications and Methods
    • /
    • 제19권6호
    • /
    • pp.899-904
    • /
    • 2012
  • High-dimensional categorical data models with small sample sizes have not been used extensively in genomic sequences that involve count (or discrete) or purely qualitative responses. A basic task is to identify differentially expressed genes (or positions) among a number of genes. It requires an appropriate test statistics and a corresponding multiple testing procedure so that a multivariate analysis of variance should not be feasible. A family wise error rate(FWER) is not appropriate to test thousands of genes simultaneously in a multiple testing procedure. False discovery rate(FDR) is better than FWER in multiple testing problems. The data from the 2002-2003 SARS epidemic shows that a conventional FDR procedure and a proposed test statistic based on a pseudo-marginal approach with Hamming distance performs better.

Bayesian Multiple Change-Point Estimation and Segmentation

  • Kim, Jaehee;Cheon, Sooyoung
    • Communications for Statistical Applications and Methods
    • /
    • 제20권6호
    • /
    • pp.439-454
    • /
    • 2013
  • This study presents a Bayesian multiple change-point detection approach to segment and classify the observations that no longer come from an initial population after a certain time. Inferences are based on the multiple change-points in a sequence of random variables where the probability distribution changes. Bayesian multiple change-point estimation is classifies each observation into a segment. We use a truncated Poisson distribution for the number of change-points and conjugate prior for the exponential family distributions. The Bayesian method can lead the unsupervised classification of discrete, continuous variables and multivariate vectors based on latent class models; therefore, the solution for change-points corresponds to the stochastic partitions of observed data. We demonstrate segmentation with real data.

Statistical analysis of metagenomics data

  • Calle, M. Luz
    • Genomics & Informatics
    • /
    • 제17권1호
    • /
    • pp.6.1-6.9
    • /
    • 2019
  • Understanding the role of the microbiome in human health and how it can be modulated is becoming increasingly relevant for preventive medicine and for the medical management of chronic diseases. The development of high-throughput sequencing technologies has boosted microbiome research through the study of microbial genomes and allowing a more precise quantification of microbiome abundances and function. Microbiome data analysis is challenging because it involves high-dimensional structured multivariate sparse data and because of its compositional nature. In this review we outline some of the procedures that are most commonly used for microbiome analysis and that are implemented in R packages. We place particular emphasis on the compositional structure of microbiome data. We describe the principles of compositional data analysis and distinguish between standard methods and those that fit into compositional data analysis.

Estimating the Survival of Patients With Lung Cancer: What Is the Best Statistical Model?

  • Abedi, Siavosh;Janbabaei, Ghasem;Afshari, Mahdi;Moosazadeh, Mahmood;Alashti, Masoumeh Rashidi;Hedayatizadeh-Omran, Akbar;Alizadeh-Navaei, Reza;Abedini, Ehsan
    • Journal of Preventive Medicine and Public Health
    • /
    • 제52권2호
    • /
    • pp.140-144
    • /
    • 2019
  • Objectives: Investigating the survival of patients with cancer is vitally necessary for controlling the disease and for assessing treatment methods. This study aimed to compare various statistical models of survival and to determine the survival rate and its related factors among patients suffering from lung cancer. Methods: In this retrospective cohort, the cumulative survival rate, median survival time, and factors associated with the survival of lung cancer patients were estimated using Cox, Weibull, exponential, and Gompertz regression models. Kaplan-Meier tables and the log-rank test were also used to analyze the survival of patients in different subgroups. Results: Of 102 patients with lung cancer, 74.5% were male. During the follow-up period, 80.4% died. The incidence rate of death among patients was estimated as 3.9 (95% confidence [CI], 3.1 to 4.8) per 100 person-months. The 5-year survival rate for all patients, males, females, patients with non-small cell lung carcinoma (NSCLC), and patients with small cell lung carcinoma (SCLC) was 17%, 13%, 29%, 21%, and 0%, respectively. The median survival time for all patients, males, females, those with NSCLC, and those with SCLC was 12.7 months, 12.0 months, 16.0 months, 16.0 months, and 6.0 months, respectively. Multivariate analyses indicated that the hazard ratios (95% CIs) for male sex, age, and SCLC were 0.56 (0.33 to 0.93), 1.03 (1.01 to 1.05), and 2.91 (1.71 to 4.95), respectively. Conclusions: Our results showed that the exponential model was the most precise. This model identified age, sex, and type of cancer as factors that predicted survival in patients with lung cancer.

사후검증(Back-testing)을 통한 다변량-GARCH 모형의 평가: 사례분석 (Assessments for MGARCH Models Using Back-Testing: Case Study)

  • 황선영;최문선;도종두
    • 응용통계연구
    • /
    • 제22권2호
    • /
    • pp.261-270
    • /
    • 2009
  • 주식 수익률, 환율 등과 같은 금융 자료를 이해하는데 있어서 최근의 국제 금융위기를 통해 더욱 중요해진 이슈는 바로 변동성(volatility)이다. 변동성(조건부 이분산성)에 대한 모형은 Engle (1982)의 ARCH 모형과 Bollerslev (1986)의 GARCH 모형을 시작으로 수만은 연구가 이루어졌으며 특히 금융 시계열 분석에서는 시계열 자료들 간의 변동성을 함께 모형화 하는 MGARCH(multivariate GARCH) 모형이 널리 이용되고 있다. 추정된 MGARCH 모형들은 그 자체로서 여러 개의 변동성들 간의 시간에 따른 동적인 관계를 설명해주는 데 유용할 뿐만 아니라 추정된 (조건부)상관계수들은 hedge ratio 계산 또는 VaR 계산 등과 같이 금융시장에 대한분석에도 이용되고 있다. 본 논문에서는 국내 14개 최신 주가자료에 대한 MGARCH 분석을 수행하고 연관된 사후검증(back-testing)을 통해 MGARCH 모형들을 평가하고 있으며 사후검증 수치를 얻기 위한 S-PLUS 프로그램을 수록하였다.

Modern vistas of process control

  • Georgakis, Christos
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 제어로봇시스템학회 1996년도 Proceedings of the Korea Automatic Control Conference, 11th (KACC); Pohang, Korea; 24-26 Oct. 1996
    • /
    • pp.18-18
    • /
    • 1996
  • This paper reviews some of the most prominent and promising areas of chemical process control both in relations to batch and continuous processes. These areas include the modeling, optimization, control and monitoring of chemical processes and entire plants. Most of these areas explicitly utilize a model of the process. For this purpose the types of models used are examined in some detail. These types of models are categorized in knowledge-driven and datadriven classes. In the areas of modeling and optimization, attention is paid to batch reactors using the Tendency Modeling approach. These Tendency models consist of data- and knowledge-driven components and are often called Gray or Hybrid models. In the case of continuous processes, emphasis is placed in the closed-loop identification of a state space model and their use in Model Predictive Control nonlinear processes, such as the Fluidized Catalytic Cracking process. The effective monitoring of multivariate process is examined through the use of statistical charts obtained by the use of Principal Component Analysis (PMC). Static and dynamic charts account for the cross and auto-correlation of the substantial number of variables measured on-line. Centralized and de-centralized chart also aim in isolating the source of process disturbances so that they can be eliminated. Even though significant progress has been made during the last decade, the challenges for the next ten years are substantial. Present progress is strongly influenced by the economical benefits industry is deriving from the use of these advanced techniques. Future progress will be further catalyzed from the harmonious collaboration of University and Industrial researchers.

  • PDF

개선된 데이터마이닝을 위한 혼합 학습구조의 제시 (Hybrid Learning Architectures for Advanced Data Mining:An Application to Binary Classification for Fraud Management)

  • Kim, Steven H.;Shin, Sung-Woo
    • 정보기술응용연구
    • /
    • 제1권
    • /
    • pp.173-211
    • /
    • 1999
  • The task of classification permeates all walks of life, from business and economics to science and public policy. In this context, nonlinear techniques from artificial intelligence have often proven to be more effective than the methods of classical statistics. The objective of knowledge discovery and data mining is to support decision making through the effective use of information. The automated approach to knowledge discovery is especially useful when dealing with large data sets or complex relationships. For many applications, automated software may find subtle patterns which escape the notice of manual analysis, or whose complexity exceeds the cognitive capabilities of humans. This paper explores the utility of a collaborative learning approach involving integrated models in the preprocessing and postprocessing stages. For instance, a genetic algorithm effects feature-weight optimization in a preprocessing module. Moreover, an inductive tree, artificial neural network (ANN), and k-nearest neighbor (kNN) techniques serve as postprocessing modules. More specifically, the postprocessors act as second0order classifiers which determine the best first-order classifier on a case-by-case basis. In addition to the second-order models, a voting scheme is investigated as a simple, but efficient, postprocessing model. The first-order models consist of statistical and machine learning models such as logistic regression (logit), multivariate discriminant analysis (MDA), ANN, and kNN. The genetic algorithm, inductive decision tree, and voting scheme act as kernel modules for collaborative learning. These ideas are explored against the background of a practical application relating to financial fraud management which exemplifies a binary classification problem.

  • PDF

다변량 분위수 회귀나무 모형에 대한 연구 (Multivariate quantile regression tree)

  • 김재오;조형준;방성완
    • Journal of the Korean Data and Information Science Society
    • /
    • 제28권3호
    • /
    • pp.533-545
    • /
    • 2017
  • 분위수 회귀모형은 반응변수의 조건부 분포에 대하여 포괄적이고 유용한 통계적 정보를 제공한다. 그러나 많은 실제 자료는 설명변수와 반응변수가 비선형의 관계를 갖고 있어 전통적인 선형 분위수 회귀모형은 왜곡되고 잘못된 결과를 초래할 수 있다. 또한 자료의 복잡성이 증가하여 반응변수가 여러개인 다변량 자료의 분석에 대한 보다 정확한 예측과 더불어 풍부한 해석에 대한 요구가 증가하고 있다. 이러한 이유로 본 연구에서는 다변량 분위수 회귀나무 모형을 제안하였다. 본 연구에서는 기존의 다변량 회귀나무 모형의 분할변수 선택 알고리즘의 문제점을 지적하고 향상된 분할변수 선택 알고리즘을 제안하였다. 제안한 알고리즘은 합리적인 계산시간으로 적용 가능하며 분할변수 선택에서 편향 발생의 문제를 갖지 않는 동시에 기존 방법보다 더 정확하게 분할변수를 선택할 수 있있다. 본 연구에서는 모의실험과 실증 예제를 통해 제안한 방법의 우수한 성능과 유용성을 확인하였다.

다변량회귀에서 주선택 반응변수 차원축소 (Principal selected response reduction in multivariate regression)

  • 유재근
    • 응용통계연구
    • /
    • 제34권4호
    • /
    • pp.659-669
    • /
    • 2021
  • 다변량 회귀분석은 경시적 자료분석이나 함수적 자료분석 등 다양한 분야에서 빈번하게 사용되는 통계적 방법론이다. 다변량 회귀분석은 설명변수의 차원 뿐만 아니라 반응변수의 차원때문에 일변량 회귀분석에서 보다 차원의 저주문제에 더 강한 영향을 받는다. 이러한 문제를 해결하기 위해 최근 Yoo (2018)와 Yoo (2019a)에 세 가지 모형기반 반응변수 차원축소 방법이 제시되었다. 하지만 Yoo (2019a)에서 제시한 기본 방법은 모의실험 결과 모형에 가장 영향을 덜 받지만, 다른 두 방법 중 더 나은 방법보다 더 좋은 추정결과를 제시하지 못한다. 이러한 단점을 극복하기 위해 본 논문에서는 기본 방법의 결과 다른 두 방법의 결과를 비교하여, 자료에 따라 최선의 방법을 제시하는 선택 알고리듬을 제시하고, 이를 주선택 반응변수 차원축소라 명명한다. 다양한 모의실험 결과 주선택 반응변수 차원축소는 Yoo (2019a)의 기본방법보다 더 정확하게 차원을 축소하고, 모든 경우에 있더 더 바람직한 방법을 선택함을 확인할 수 있다. 이러한 결과로 제안한 주선택 반응변수의 차원축소 방법의 실제적 유용성을 확인할 수 있다.