• Title/Summary/Keyword: Nonparametric Statistics Analysis

Search Result 102, Processing Time 0.025 seconds

The Recency Period for Estimation of Human Immunodeficiency Virus Incidence by the AxSYM Avidity Assay and BED-Capture Enzyme Immunoassay in the Republic of Korea

  • Yu, Hye-Kyung;Heo, Tae-Young;Kim, Na-Young;Wang, Jin-Sook;Lee, Jae-Kyeong;Kim, Sung Soon;Kee, Mee-Kyung
    • Osong Public Health and Research Perspectives
    • /
    • v.5 no.4
    • /
    • pp.187-192
    • /
    • 2014
  • Objectives: Measurement of the incidence of the human immunodeficiency virus (HIV) is very important for epidemiological studies. Here, we determined the recency period with the AxSYM avidity assay and the BED-capture enzyme immunoassay (BED-CEIA) in Korean seroconverters. Methods: Two hundred longitudinal specimens from 81 seroconverters with incident HIV infections that had been collected at the Korea National Institute of Health were subjected to the AxSYM avidity assay (cutoff = 0.8) and BED-CEIA (cutoff = 0.8). The statistical method used to estimate the recency period in recent HIV infections was nonparametric survival analyses. Sensitivity and specificity were calculated for 10-day increments from 120 days to 230 days to determine the recency period. Results: The mean recency period of the avidity assay and BED-CEIA using a survival method was 158 days [95% confidence interval (CI), 135-181 days] and 189 days (95% CI, 170-208 days), respectively. Based on the use of sensitivity and specificity, the mean recency period for the avidity assay and BED-CEIA was 150 days and 200 days, respectively. Conclusion: We determined the recency period to estimate HIV incidence in Korea. These data showed that the nonparametric survival analysis often led to shorter recency periods than analysis of sensitivity and specificity as a new method. These findings suggest that more data from seroconverters and other methodologies are needed to determine the recency period for estimating HIV incidence.

Local Linear Logistic Classification of Microarray Data Using Orthogonal Components (직교요인을 이용한 국소선형 로지스틱 마이크로어레이 자료의 판별분석)

  • Baek, Jang-Sun;Son, Young-Sook
    • The Korean Journal of Applied Statistics
    • /
    • v.19 no.3
    • /
    • pp.587-598
    • /
    • 2006
  • The number of variables exceeds the number of samples in microarray data. We propose a nonparametric local linear logistic classification procedure using orthogonal components for classifying high-dimensional microarray data. The proposed method is based on the local likelihood and can be applied to multi-class classification. We applied the local linear logistic classification method using PCA, PLS, and factor analysis components as new features to Leukemia data and colon data, and compare the performance of the proposed method with the conventional statistical classification procedures. The proposed method outperforms the conventional ones for each component, and PLS has shown best performance when it is embedded in the proposed method among the three orthogonal components.

A Report on the Inter-Gene Correlations in cDNA Microarray Data Sets (cDNA 마이크로어레이에서 유전자간 상관 관계에 대한 보고)

  • Kim, Byung-Soo;Jang, Jee-Sun;Kim, Sang-Cheol;Lim, Jo-Han
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.3
    • /
    • pp.617-626
    • /
    • 2009
  • A series of recent papers reported that the inter-gene correlations in Affymetrix microarray data sets were strong and long-ranged, and the assumption of independence or weak dependence among gene expression signals which was often employed without justification was in conflict with actual data. Qui et al. (2005) indicated that applying the nonparametric empirical Bayes method in which test statistics were pooled across genes for performing the statistical inference resulted in the large variance of the number of differentially expressed genes. Qui et al. (2005) attributed this effect to strong and long-ranged inter-gene correlations. Klebanov and Yakovlev (2007) demonstrated that the inter-gene correlations provided a rich source of information rather than being a nuisance in the statistical analysis and they developed, by transforming the original gene expression sequence, a sequence of independent random variables which they referred to as a ${\delta}$-sequence. We note in this report using two cDNA microarray data sets experimented in this country that the strong and long-ranged inter-gene correlations were still valid in cDNA microarray data and also the ${\delta}$-sequence of independence could be derived from the cDNA microarray data. This note suggests that the inter-gene correlations be considered in the future analysis of the cDNA microarray data sets.

Constructing Simultaneous Confidence Intervals for the Difference of Proportions from Multivariate Binomial Distributions

  • Jeong, Hyeong-Chul;Kim, Dae-Hak
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.1
    • /
    • pp.129-140
    • /
    • 2009
  • In this paper, we consider simultaneous confidence intervals for the difference of proportions between two groups taken from multivariate binomial distributions in a nonparametric way. We briefly discuss the construction of simultaneous confidence intervals using the method of adjusting the p-values in multiple tests. The features of bootstrap simultaneous confidence intervals using non-pooled samples are presented. We also compute confidence intervals from the adjusted p-values of multiple tests in the Westfall (1985) style based on a pooled sample. The average coverage probabilities of the bootstrap simultaneous confidence intervals are compared with those of the Bonferroni simultaneous confidence intervals and the Sidak simultaneous confidence intervals. Finally, we give an example that shows how the proposed bootstrap simultaneous confidence intervals can be utilized through data analysis.

On the analysis of multistate survival data using Cox's regression model (Cox 회귀모형을 이용한 다중상태의 생존자료분석에 관한 연구)

  • Sung Chil Yeo
    • The Korean Journal of Applied Statistics
    • /
    • v.7 no.2
    • /
    • pp.53-77
    • /
    • 1994
  • In a certain stochastic process, Cox's regression model is used to analyze multistate survival data. From this model, the regression parameter vectors, survival functions, and the probability of being in response function are estimated based on multistate Cox's partial likelihood and nonparametric likelihood methods. The asymptotic properties of these estimators are described informally through the counting process approach. An example is given to likelihood the results in this paper.

  • PDF

Simultaneous outlier detection and variable selection via difference-based regression model and stochastic search variable selection

  • Park, Jong Suk;Park, Chun Gun;Lee, Kyeong Eun
    • Communications for Statistical Applications and Methods
    • /
    • v.26 no.2
    • /
    • pp.149-161
    • /
    • 2019
  • In this article, we suggest the following approaches to simultaneous variable selection and outlier detection. First, we determine possible candidates for outliers using properties of an intercept estimator in a difference-based regression model, and the information of outliers is reflected in the multiple regression model adding mean shift parameters. Second, we select the best model from the model including the outlier candidates as predictors using stochastic search variable selection. Finally, we evaluate our method using simulations and real data analysis to yield promising results. In addition, we need to develop our method to make robust estimates. We will also to the nonparametric regression model for simultaneous outlier detection and variable selection.

Uncertainty Analysis for Parameter Estimation of Probability Distribution in Rainfall Frequency Analysis Using Bootstrap (강우빈도해석에서 Bootstrap을 이용한 확률분포의 매개변수 추정에 대한 불확실성 해석)

  • Seo, Young-Min;Park, Ki-Bum
    • Journal of Environmental Science International
    • /
    • v.20 no.3
    • /
    • pp.321-327
    • /
    • 2011
  • Bootstrap methods is the computer-based resampling method that estimates the standard errors and confidence intervals of summary statistics using the plug-in principle for assessing the accuracy or uncertainty of statistical estimates, and the BCa method among the Bootstrap methods is known much superior to other Bootstrap methods in respect of the standards of statistical validation. Therefore this study suggests the method of the representation and treatment of uncertainty in flood risk assessment and water resources planning from the construction and application of rainfall frequency analysis model considersing the uncertainty based on the nonparametric BCa method among the Bootstrap methods for the assessement of the estimation of probability rainfall and the effect of uncertainty considering the uncertainty of the parameter estimation of probability in the rainfall frequency analysis that is the most fundamental in flood risk assessement and water resources planning.

A Study of Travel Time Prediction using K-Nearest Neighborhood Method (K 최대근접이웃 방법을 이용한 통행시간 예측에 대한 연구)

  • Lim, Sung-Han;Lee, Hyang-Mi;Park, Seong-Lyong;Heo, Tae-Young
    • The Korean Journal of Applied Statistics
    • /
    • v.26 no.5
    • /
    • pp.835-845
    • /
    • 2013
  • Travel-time is considered the most typical and preferred traffic information for intelligent transportation systems(ITS). This paper proposes a real-time travel-time prediction method for a national highway. In this paper, the K-nearest neighbor(KNN) method is used for travel time prediction. The KNN method (a nonparametric method) is appropriate for a real-time traffic management system because the method needs no additional assumptions or parameter calibration. The performances of various models are compared based on mean absolute percentage error(MAPE) and coefficient of variation(CV). In real application, the analysis of real traffic data collected from Korean national highways indicates that the proposed model outperforms other prediction models such as the historical average model and the Kalman filter model. It is expected to improve travel-time reliability by flexibly using travel-time from the proposed model with travel-time from the interval detectors.

Comparison Study of Time Series Clustering Methods (시계열자료 눈집방법의 비교연구)

  • Hong, Han-Woom;Park, Min-Jeong;Cho, Sin-Sup
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.6
    • /
    • pp.1203-1214
    • /
    • 2009
  • In this paper we introduce the time series clustering methods in the time and frequency domains and discuss the merits or demerits of each method. We analyze 15 daily stock prices of KOSPI 200, and the nonparametric method using the wavelet shows the best clustering results. For the clustering of nonstationary time series using the spectral density, the EMD method remove the trend more effectively than the differencing.

Statistical Methods Used in Articles of the Korean Journal of Acupuncture (경락경혈학회지 게재논문에 사용된 통계방법)

  • Kim, Jung-Eun;Kang, Kyung-Won;Lee, Min-Hee;Lee, Sanghun
    • Korean Journal of Acupuncture
    • /
    • v.30 no.1
    • /
    • pp.1-8
    • /
    • 2013
  • Objectives : The purpose of the present study was to examine statistical methods used in articles published on the Korean Journal of Acupuncture from 2007 through 2012. Methods : Statistical methods and statistical packages used in original articles applied with descriptive statistics or inferential statistics were organized. Results : Out of a total of 195 original articles, 18 articles used descriptive statistics only and 177 articles used inferential statistics. 142 articles used 12 types of statistical packages. SPSS was used most at 97 times(63.4%). The number of descriptive statistical methods used was a total of 417 and among them 193 were presented as tables(46.3%) and 224 were presented as graphs(53.7%). The number of inferential statistics applied was a total of 256 and analysis of variance was used most at 90 times(35.2%). The number of parametric statistical methods used was a total of 170(75.6%) and that of nonparametric statistical methods used was a total of 55(24.4%). Analysis of variance and two sample t-test were most employed in both clinical and non-clinical research. The number of multiple comparison methods applied was a total of 67 and the number of Scheffe methods among them was most at 26 times(37.7%). Conclusions : In the present study, statistical methods used in the journal over the last six years were examined. The result of this study is considered to be a basic material to be referred to when evaluating the quality of the medical journal.