• Title/Summary/Keyword: univariate method

Search Result 270, Processing Time 0.022 seconds

Categorical Variable Selection in Naïve Bayes Classification (단순 베이즈 분류에서의 범주형 변수의 선택)

  • Kim, Min-Sun;Choi, Hosik;Park, Changyi
    • The Korean Journal of Applied Statistics
    • /
    • v.28 no.3
    • /
    • pp.407-415
    • /
    • 2015
  • $Na{\ddot{i}}ve$ Bayes Classification is based on input variables that are a conditionally independent given output variable. The $Na{\ddot{i}}ve$ Bayes assumption is unrealistic but simplifies the problem of high dimensional joint probability estimation into a series of univariate probability estimations. Thus $Na{\ddot{i}}ve$ Bayes classier is often adopted in the analysis of massive data sets such as in spam e-mail filtering and recommendation systems. In this paper, we propose a variable selection method based on ${\chi}^2$ statistic on input and output variables. The proposed method retains the simplicity of $Na{\ddot{i}}ve$ Bayes classier in terms of data processing and computation; however, it can select relevant variables. It is expected that our method can be useful in classification problems for ultra-high dimensional or big data such as the classification of diseases based on single nucleotide polymorphisms(SNPs).

A Study on the Use of Cluster Analysis for Multivariate and Multipurpose Stratification (군집분석을 이용한 다목적 조사의 층화에 관한 연구)

  • Park, Jin-Woo;Yun, Seok-Hoon;Kim, Jin-Heum;Jeong, Hyeong-Chul
    • The Korean Journal of Applied Statistics
    • /
    • v.20 no.2
    • /
    • pp.387-394
    • /
    • 2007
  • This paper considers several stratification strategies for multivariate and multipurpose survey with several quantitative stratification variables. We propose three methods of stratification based on, respectively, the method of cumulative frequency square root which is the most popular one in univariate stratification, cluster analysis, and factor analysis followed by cluster analysis. We then compare the efficiency of those methods using the Dong-Eup-Myun data of the holding numbers of farming machines, extracted from the 2001 Agricultural Census. It turned out that the method based on cluster analysis with factor analysis would be a relatively satisfactory strategy.

A Comparison Study of Bayesian Methods for a Threshold Autoregressive Model with Regime-Switching (국면전환 임계 자기회귀 분석을 위한 베이지안 방법 비교연구)

  • Roh, Taeyoung;Jo, Seongil;Lee, Ryounghwa
    • The Korean Journal of Applied Statistics
    • /
    • v.27 no.6
    • /
    • pp.1049-1068
    • /
    • 2014
  • Autoregressive models are used to analyze an univariate time series data; however, these methods can be inappropriate when a structural break appears in a time series since they assume that a trend is consistent. Threshold autoregressive models (popular regime-switching models) have been proposed to address this problem. Recently, the models have been extended to two regime-switching models with delay parameter. We discuss two regime-switching threshold autoregressive models from a Bayesian point of view. For a Bayesian analysis, we consider a parametric threshold autoregressive model and a nonparametric threshold autoregressive model using Dirichlet process prior. The posterior distributions are derived and the posterior inferences is performed via Markov chain Monte Carlo method and based on two Bayesian threshold autoregressive models. We present a simulation study to compare the performance of the models. We also apply models to gross domestic product data of U.S.A and South Korea.

Principal selected response reduction in multivariate regression (다변량회귀에서 주선택 반응변수 차원축소)

  • Yoo, Jae Keun
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.4
    • /
    • pp.659-669
    • /
    • 2021
  • Multivariate regression often appears in longitudinal or functional data analysis. Since multivariate regression involves multi-dimensional response variables, it is more strongly affected by the so-called curse of dimension that univariate regression. To overcome this issue, Yoo (2018) and Yoo (2019a) proposed three model-based response dimension reduction methodologies. According to various numerical studies in Yoo (2019a), the default method suggested in Yoo (2019a) is least sensitive to the simulated models, but it is not the best one. To release this issue, the paper proposes an selection algorithm by comparing the other two methods with the default one. This approach is called principal selected response reduction. Various simulation studies show that the proposed method provides more accurate estimation results than the default one by Yoo (2019a), and it confirms practical and empirical usefulness of the propose method over the default one by Yoo (2019a).

Predictive Factors of Survival Time of Breast Cancer in Kurdistan Province of Iran between 2006-2014: A Cox Regression Approach

  • Karimi, Asrin;Delpisheh, Ali;Sayehmiri, Kourosh;Saboori, Hojjatollah;Rahimi, Ezzatollah
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.15 no.19
    • /
    • pp.8483-8488
    • /
    • 2014
  • Background: Breast cancer is the most common cancer and the second most common cause of cancer-induced mortalities in Iranian women, following gastric carcinoma. The survival of these patients depends on several factors, which are very important to identify in order to understand the natural history of the disease. Materials and Methods: In this retrospective study, 313 consecutive women with pathologically-proven diagnosis of breast cancer who had been treated during a seven-year period (January 2006 until March 2014) at Towhid hospital, Sanandaj city, Kurdistan province of Iran, were recruited. The Kaplan-Meier method was used for data analysis, and finally those factors that showed significant association on univariate analysis were entered in a Cox regression model. Results: the mean age of patients was $46.10{\pm}10.81$ years. Based on Kaplan-Meier method median of survival time was 81 months and 5 year survival rate was $75%{\pm}0.43$. Tumor metastasis (HR=9.06, p=0.0001), relapse (HR=3.20, p=0.001), clinical stage of cancer (HR=2.30, p=0.03) and place of metastasis (p=0.0001) had significant associations with the survival rate variation. Patients with tumor metastasis had the lowest five-year survival rate (37%)and among them patients who had brain metastasis were in the worst condition (5 year survival rate= $11%{\pm}0.10$). Conclusions: Our findings support the observation that those women with higher stages of breast malignancies (especially with metastatic cancer) have less chance of surviving the disease. Furthermore, screening programs and early detection of breast cancer may help to increase the survival of those women who are at risk of breast cancer.

A Study on Fault Detection of Cycle-based Signals using Wavelet Transform (웨이블릿을 이용한 주기 신호 데이터의 이상 탐지에 관한 연구)

  • Lee, Jae-Hyun;Kim, Ji-Hyun;Hwang, Ji-Bin;Kim, Sung-Shick
    • Journal of the Korea Society for Simulation
    • /
    • v.16 no.4
    • /
    • pp.13-22
    • /
    • 2007
  • Fault detection of cycle-based signals is typically performed using statistical approaches. Univariate SPC using few representative statistics and multivariate analysis methods such as PCA and PLS are the most popular methods for analyzing cycle-based signals. However, such approaches are limited when dealing with information-rich cycle-based signals. In this paper, process fault defection method based on wavelet analysis is proposed. Using Haar wavelet, coefficients that well reflect the process condition are selected. Next, Hotelling's $T^2$ chart using selected coefficients is constructed for assessment of process condition. To enhance the overall efficiency of fault detection, the following two steps are suggested, i.e. denoising method based on wavelet transform and coefficient selection methods using variance difference. For performance evaluation, various types of abnormal process conditions are simulated and the proposed algorithm is compared with other methodologies.

  • PDF

Estimation and Performance Analysis of Risk Measures using Copula and Extreme Value Theory (코퓰러과 극단치이론을 이용한 위험척도의 추정 및 성과분석)

  • Yeo, Sung-Chil
    • The Korean Journal of Applied Statistics
    • /
    • v.19 no.3
    • /
    • pp.481-504
    • /
    • 2006
  • VaR, a tail-related risk measure is now widely used as a tool for a measurement and a management of financial risks. For more accurate measurement of VaR, recently we are particularly concerned about the approach based on extreme value theory rather than the traditional method based on the assumption of normal distribution. However, many studies about the approaches using extreme value theory was done only for the univariate case. In this paper, we discuss portfolio risk measurements with modelling multivariate extreme value distributions by combining copulas and extreme value theory. We also discuss the estimation of ES together with VaR as portfolio risk measures. Finally, we investigate the relative superiority of EVT-copula approach than variance-covariance method through the back-testing of an empirical data.

River Flow Forecasting Model for the Youngsan Estuary Reservoir Operations(I) -Estimation Runof Hydrographs at Naju Station (영산호 운영을 위한 홍수예보모형의 개발(I) -나주지점의 홍수유출 추정-)

  • 박창언;박승우
    • Magazine of the Korean Society of Agricultural Engineers
    • /
    • v.36 no.4
    • /
    • pp.95-102
    • /
    • 1994
  • The series of the papers consist of three parts to describe the development, calibration, and applications of the flood forecasting models for the Youngsan Estuarine Dam located at the mouth of the Youngsan river. And this paper discusses the hydrologic model for inflow simulation at Naju station, which constitutes 64 percent of the drainage basin of 3521 .6km$^2$ in area. A simplified TANK model was formulated to simulate hourly runoff from rainfall And the model parameters were optirnized using historical storm data, and validated with the records. The results of this paper were summarized as follows. 1. The simplified TANK model was formulated to conceptualize the hourly rainfall-run-off relationships at a watershed with four tanks in series having five runoff outlets. The runoff from each outlet was assumed to be proportional to the storage exceeding a threshold value. And each tank was linked with a drainage hole from the upper one. 2. Fifteen storm events from four year records from 1984 to 1987 were selected for this study. They varied from 81 to 289rn'm The watershed averaged, hourly rainfall data were determined from those at fifteen raingaging stations using a Thiessen method. Some missing and unrealistic records at a few stations were estimated or replaced with the values determined using a reciprocal distance square method from abjacent ones. 3. An univariate scheme was adopted to calibrate the model parameters using historical records. Some of the calibrated parameters were statistically related to antecedent precipitation. And the model simulated the streamflow close to the observed, with the mean coefficient of determination of 0.94 for all storm events. 4. The simulated streamflow were in good agreement with the historical records for ungaged condition simulation runs. The mean coefficient of determination for the runs was 0.93, nearly the same as calibration runs. This may indicates that the model performs very well in flood forecasting situations for the watershed.

  • PDF

An Alternative Method for Assessing Local Spatial Association Among Inter-paired Location Events: Vector Spatial Autocorrelation in Housing Transactions (쌍대위치 이벤트들의 국지적 공간적 연관성을 평가하기 위한 방법론적 연구: 주택거래의 벡터 공간적 자기상관)

  • Lee, Gun-Hak
    • Journal of the Economic Geographical Society of Korea
    • /
    • v.11 no.4
    • /
    • pp.564-579
    • /
    • 2008
  • It is often challenging to evaluate local spatial association among onedimensional vectors generally representing paired-location events where two points are physically or functionally connected. This is largely because of complex process of such geographic phenomena itself and partially representational complexity. This paper addresses an alternative way to identify spatially autocorrelated paired-location events (or vectors) at a local scale. In doing so, we propose a statistical algorithm combining univariate point pattern analysis for evaluating local clustering of origin-points and similarity measure of corresponding vectors. For practical use of the suggested method, we present an empirical application using transactions data in a local housing market, particularly recorded from 2004 to 2006 in Franklin County, Ohio in the United States. As a result, several locally characterized similar transactions are identified among a set of vectors showing various local moves associated with communities defined.

  • PDF

Prognostic Value of Esophageal Resectionline Involvement in a Total Gastrectomy for Gastric Cancer (위전절제술 시 식도측 절제연 암 침윤의 예후적 가치)

  • Kwon, Sung-Joon
    • Journal of Gastric Cancer
    • /
    • v.1 no.3
    • /
    • pp.168-173
    • /
    • 2001
  • Purpose: A positive esophageal margin is encountered in a total gastrectomy not infrequently. The aim of this retrospective review was to evaluate whether a positive esophageal margin predisposes a patient to loco-regional recurrence and whether it has an independent impact on long-term survival. Materials and Methods: A retrospective review of 224 total gastrectomies for adenocarcinomas was undertaken. The Chisquare test was used to determine the statistical significance of differences, and the Kaplan-Meier method was used to calculate survival rates. Significant differences in the survival rates were assessed using the log-rank test, and independent prognostic significance was evaluated using the Cox regression method. Results: The prevalence of esophageal margin involvement was $3.6\%$ (8/224). Univariate analysis showed that advanced stage (stage III/IV), tumor size ($\geq$5 cm), tumor site (whole or upper one-third of the stomach), macroscopic type (Borrmann type 4), esophageal invasion, esophageal margin involvement, lymphatic invasion, and venous invasion affected survival. Multivariate analysis demonstrated that TNM stage, venous invasion, and esophageal margin involvement were the only significant factors influencing the prognosis. All patients with a positive esophageal margin died with metastasis before local recurrence became a problem. A macroscopic proximal distance of more than 6 cm of esophagus was needed to be free of tumors, excluding one exceptional case which involved 15 cm of esophagus. Conclusion: All of the patients with a positive proximal resection margin after a total gastrectomy had advanced disease with a poor prognosis, but they were not predisposed to anastomotic recurrence. Early detection and extended, but reasonable, surgical resection of curable lesions are mandatory to improve the prognosis.

  • PDF