• Title/Summary/Keyword: Chi-square statistics

Search Result 639, Processing Time 0.022 seconds

Applying Randomization Tests to Collocation Analyses in Large Corpora (언어의 공기관계 분석을 위한 임의화검증의 응용)

  • Yang Kyung-Sook;Kim HeeYoung
    • The Korean Journal of Applied Statistics
    • /
    • v.18 no.3
    • /
    • pp.583-595
    • /
    • 2005
  • Contingency tables are used to compare counts of n-grams to determine if the n-gram is a true collocation, meaning that the words that make up the n-gram are highly associated in the text. Some statistical methods for identifying collocation are used. They are Kulczinsky coefficient, Ochiai coefficient, Frager and McGowan coefficient, Yule coefficient, mutual information, and chi-square, and so on. But the main problem is that these measures are based ell the assumption of a nor-mal or approximately normal distribution of the variables being sampled. While this assumption is valid in most instances, it is not valid when comparing the rates of occurrence of rare events, and texts are composed mostly of rare events. In this paper we have simply reviewed some statistics about testing association of two words. Some randomization tests to evaluate the significance level in analyzing collocation in large corpora are proposed. A related graph can be used to compare different lest statistics that ran be used to analyze the same contingency table.

Bayesian Test of Quasi-Independence in a Sparse Two-Way Contingency Table

  • Kwak, Sang-Gyu;Kim, Dal-Ho
    • Communications for Statistical Applications and Methods
    • /
    • v.19 no.3
    • /
    • pp.495-500
    • /
    • 2012
  • We consider a Bayesian test of independence in a two-way contingency table that has some zero cells. To do this, we take a three-stage hierarchical Bayesian model under each hypothesis. For prior, we use Dirichlet density to model the marginal cell and each cell probabilities. Our method does not require complicated computation such as a Metropolis-Hastings algorithm to draw samples from each posterior density of parameters. We draw samples using a Gibbs sampler with a grid method. For complicated posterior formulas, we apply the Monte-Carlo integration and the sampling important resampling algorithm. We compare the values of the Bayes factor with the results of a chi-square test and the likelihood ratio test.

CHAID Algorithm by Cube-based Proportional Sampling

  • Park, Hee-Chang;Cho, Kwang-Hyun
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 2004.04a
    • /
    • pp.39-50
    • /
    • 2004
  • The decision tree approach is most useful in classification problems and to divide the search space into rectangular regions. Decision tree algorithms are used extensively for data mining in many domains such as retail target marketing, fraud dection, data reduction and variable screening, category merging, etc. CHAID(Chi-square Automatic Interaction Detector) uses the chi-squired statistic to determine splitting and is an exploratory method used to study the relationship between a dependent variable and a series of predictor variables. In this paper we propose CHAID algorithm by cube-based proportional sampling and explore CHAID algorithm in view of accuracy and speed by the number of variables.

  • PDF

Intensive numerical studies of optimal sufficient dimension reduction with singularity

  • Yoo, Jae Keun;Gwak, Da-Hae;Kim, Min-Sun
    • Communications for Statistical Applications and Methods
    • /
    • v.24 no.3
    • /
    • pp.303-315
    • /
    • 2017
  • Yoo (2015, Statistics and Probability Letters, 99, 109-113) derives theoretical results in an optimal sufficient dimension reduction with singular inner-product matrix. The results are promising, but Yoo (2015) only presents one simulation study. So, an evaluation of its practical usefulness is necessary based on numerical studies. This paper studies the asymptotic behaviors of Yoo (2015) through various simulation models and presents a real data example that focuses on ordinary least squares. Intensive numerical studies show that the $x^2$ test by Yoo (2015) outperforms the existing optimal sufficient dimension reduction method. The basis estimation by the former can be theoretically sub-optimal; however, there are no notable differences from that by the latter. This investigation confirms the practical usefulness of Yoo (2015).

A New Integral Representation of the Coverage Probability of a Random Convex Hull

  • Son, Won;Ng, Chi Tim;Lim, Johan
    • Communications for Statistical Applications and Methods
    • /
    • v.22 no.1
    • /
    • pp.69-80
    • /
    • 2015
  • In this paper, the probability that a given point is covered by a random convex hull generated by independent and identically-distributed random points in a plane is studied. It is shown that such probability can be expressed in terms of an integral that can be approximated numerically by function-evaluations over the grid-points in a 2-dimensional space. The new integral representation allows such probability be computed efficiently. The computational burdens under the proposed integral representation and those in the existing literature are compared. The proposed method is illustrated through numerical examples where the random points are drawn from (i) uniform distribution over a square and (ii) bivariate normal distribution over the two-dimensional Euclidean space. The applications of the proposed method in statistics are are discussed.

A Document Sentiment Classification System Based on the Feature Weighting Method Improved by Measuring Sentence Sentiment Intensity (문장 감정 강도를 반영한 개선된 자질 가중치 기법 기반의 문서 감정 분류 시스템)

  • Hwang, Jae-Won;Ko, Young-Joong
    • Journal of KIISE:Software and Applications
    • /
    • v.36 no.6
    • /
    • pp.491-497
    • /
    • 2009
  • This paper proposes a new feature weighting method for document sentiment classification. The proposed method considers the difference of sentiment intensities among sentences in a document. Sentiment features consist of sentiment vocabulary words and the sentiment intensity scores of them are estimated by the chi-square statistics. Sentiment intensity of each sentence can be measured by using the obtained chi-square statistics value of each sentiment feature. The calculated intensity values of each sentence are finally applied to the TF-IDF weighting method for whole features in the document. In this paper, we evaluate the proposed method using support vector machine. Our experimental results show that the proposed method performs about 2.0% better than the baseline which doesn't consider the sentiment intensity of a sentence.

Effects of Group Tai Chi Exercise Prograam on Body Mass Index(BMI), Positive and Negative Psychiatric Symptoms in Patient with Schizophrenia (타이치 운동프로그램이 정신분열병 환자의 신체질량지수와 양성 및 음성 정신 증상에 미치는 효과)

  • Kwon, Yun-Hee;Kwag, Oh-Gye
    • The Korean Journal of Rehabilitation Nursing
    • /
    • v.14 no.2
    • /
    • pp.129-135
    • /
    • 2011
  • Purpose: This study was done to examine the effects of Tai Chi exercise program on BMI, positive and negative psychiatric symptoms in patient with schizophrenia. Methods: The participants were patient with schizophrenia in S psychiatric hospital in D city. Twenty five patients were assigned to experimental group, and 26 patients were assigned to control group. Data were collected from May 9, to July 8, 2011. The Tai Chi exercise program was conducted with a duration of 60 minutes, 2 times a week for 8 weeks (a total 8 times). Measures were BMI, positive and negative psychiatric symptoms. Data were analyzed using descriptive statistics, chi-square test and t-test with SPSS/WIN 19.0 version. Result: The experimental group received Tai Chi exercise program had a significant changes in BMI, positive and negative psychiatric symptoms. Conclusion: The results of this study indicate that Tai Chi exercise program is an effective intervention program to improve the BMI, positive and negative psychiatric symptoms of patients with schizophrenia.

Factors associated with Cognitive Decline in the Elderly in Community (일 지역사회 노인의 인지기능저하 요인)

  • Kwon, Young-Sook;Paek, Kyung-Shin
    • Journal of Digital Convergence
    • /
    • v.12 no.2
    • /
    • pp.587-594
    • /
    • 2014
  • This study was carried out to look into the cognitive function of the elderly in community and investigate the factors affecting their cognitive decline provide preliminary data so as to help develop a program to maintain and promote cognitive function. With 481 senior citizens aged over 65 in J. city, a survey was conducted on their demographic characteristics, health-related characteristics and depression using structured questionnaires from September 1 through 7, 2011. The collected data were analyzed by descriptive statistics, Chi-square test and logistic regression analysis using IBM SPSS Statistics V. 20. As a result of this study, 40.1% of the subjects showed cognitive decline and the factors related to their cognitive decline turned out to be the level of education (p<.001), age (p=.000), depression and exercise (p<.05). Therefore, intervention programs on depression or exercise should be implemented intensively and in particular, various programs and educations should be provided considering individual differences according to the level of education and age.

Applications on p-values of Chi-Square Distribution

  • Hong, Chong Sun;Hong, Sung Sick
    • Communications for Statistical Applications and Methods
    • /
    • v.9 no.3
    • /
    • pp.877-887
    • /
    • 2002
  • In this paper, behaviors and properties of p-values for goodness-of-fit test are investigated. With some findings on the p-values, we consider some applications to determine sample size of a survey research using the regression equation based on a pilot study data. Regression equations are obtained by the well-known least squared method, and we find that regression lines could be formulated with only two data points, alternatively. For further studies, this works might be extended to t distributions for testing hypotheses about population mean in order to determine sample size of a prospective study. Also similar arguments could be explored for F test statistics.

ARMA Modeling for Nonstationary Time Series Data without Differencing

  • Shin, Dong-Wan;Park, You-Sung
    • Journal of the Korean Statistical Society
    • /
    • v.28 no.3
    • /
    • pp.371-387
    • /
    • 1999
  • For possibly nonstationary autoregressive moving average, modeling based on the original observations rather than the differenced observations is considered. Under this scheme, sample autocorrelation functions, parameter estimates, model diagnostic statistics, and prediction are all computed from the original data instead of the differenced data. The methods and results established under stationarity of data are shown to naturally extend to the nonstationarity of one autoregressive unit root. The sample ACF and PACF can be used for ARMA order determination. The BIC order is strongly consistent. The parameter estimates are asymptotically normal. The portmanteau statistic has chi-square distribution. The predictor is asymptotically equivalent to that based on the differenced data.

  • PDF