• Title/Summary/Keyword: Chi-square

Search Result 3,669, Processing Time 0.031 seconds

A Scene Change Detection Technique using the Weighted $\chi^2$-test and the Automated Threshold-Decision Algorithm (변형된 $\chi^2$- 테스트와 자동 임계치-결정 알고리즘을 이용한 장면전환 검출 기법)

  • Ko, Kyong-Cheol;Rhee, Yang-Won
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.42 no.4 s.304
    • /
    • pp.51-58
    • /
    • 2005
  • This paper proposes a robust scene change detection technique that uses the weighted chi-square test and the automated threshold-decision algorithms. The weighted chi-square test can subdivide the difference values of individual color channels by calculating the color intensities according to NTSC standard, and it can detect the scene change by joining the weighted color intensities to the predefined chi-square test which emphasize the comparative color difference values. The automated threshold-decision at algorithm uses the difference values of frame-to-frame that was obtained by the weighted chi-square test. At first, The Average of total difference values is calculated and then, another average value is calculated using the previous average value from the difference values, finally the most appropriate mid-average value is searched and considered the threshold value. Experimental results show that the proposed algorithms are effective and outperform the previous approaches.

Empirical Comparisons of Disparity Measures for Three Dimensional Log-Linear Models

  • Park, Y.S.;Hong, C.S.;Jeong, D.B.
    • Journal of the Korean Data and Information Science Society
    • /
    • v.17 no.2
    • /
    • pp.543-557
    • /
    • 2006
  • This paper is concerned with the applicability of the chi-square approximation to the six disparity statistics: the Pearson chi-square, the generalized likelihood ratio, the power divergence, the blended weight chi-square, the blended weight Hellinger distance, and the negative exponential disparity statistic. Three dimensional contingency tables of small and moderate sample sizes are generated to be fitted to all possible hierarchical log-linear models: the completely independent model, the conditionally independent model, the partial association models, and the model with one variable independent of the other two. For models with direct solutions of expected cell counts, point estimates and confidence intervals of the 90 and 95 percentage points of six statistics are explored. For model without direct solutions, the empirical significant levels and the empirical powers of six statistics to test the significance of the three factor interaction are computed and compared.

  • PDF

An Empirical Study of Qualities of Association Rules from a Statistical View Point

  • Dorn, Maryann;Hou, Wen-Chi;Che, Dunren;Jiang, Zhewei
    • Journal of Information Processing Systems
    • /
    • v.4 no.1
    • /
    • pp.27-32
    • /
    • 2008
  • Minimum support and confidence have been used as criteria for generating association rules in all association rule mining algorithms. These criteria have their natural appeals, such as simplicity; few researchers have suspected the quality of generated rules. In this paper, we examine the rules from a more rigorous point of view by conducting statistical tests. Specifically, we use contingency tables and chi-square test to analyze the data. Experimental results show that one third of the association rules derived based on the support and confidence criteria are not significant, that is, the antecedent and consequent of the rules are not correlated. It indicates that minimum support and minimum confidence do not provide adequate discovery of meaningful associations. The chi-square test can be considered as an enhancement or an alternative solution.

Analysis on the Amino Acid Distributions with Position in Transmembrane Proteins

  • Chi, Sang-Mun
    • Journal of the Korean Data and Information Science Society
    • /
    • v.16 no.4
    • /
    • pp.745-758
    • /
    • 2005
  • This paper presents a statistical analysis on the position-specific distributions of amino acid residues in transmembrane proteins. A hidden Markov model segments membrane proteins to produce segmented regions of homogeneous statistical property from variable-length amino acids sequences. These segmented residues are analyzed by using chi-square statistic and relative-entropy in order to find position-specific amino acids. This analysis showed that isoleucine and valine concentrated on the center of membrane-spanning regions, tryptophan, tyrosine and positive residues were found frequently near both ends of membrane.

  • PDF

Clinical Analysis of Symptoms and Oriental Medical Prescriptions According to Elapsed Time of Stroke in Oriental Medical Hospital Inpatients

  • Yun, Hen-Ja;Sung, Kang-Keyng
    • Herbal Formula Science
    • /
    • v.20 no.1
    • /
    • pp.133-147
    • /
    • 2012
  • Objectives : This study was intended to understand characteristics of symptoms, oriental medicine prescription and laboratory test results according to elapsed time of stroke. Methods : Through the medical records of 205 stroke inpatients in the oriental medical hospital in the year 2010, we investigated manifested symptoms, administered oriental medicine prescription and clinical pathological examination results. Collected items were classified to depend on stroke types, cerebral infarction and hemorrhage. We analyzed association between manifested symptoms, the oriental medicine prescription, and laboratory test results of stroke patients and elapsed time. Chi-square tests were performed to determine the significance level of association. Results : All symptoms, prescriptions and laboratory test results in cerebral infarction patients were associated with elapsed time. Especially, symptoms, prescriptions and pathological examination results showed very high statistical significance with elapsed time (a symptom; chi-square(df)=164.3(22), p<0.001, prescription; chi-square(df)=93.5(22), p<0.001, and pathological examination results; chi-square(df)=164.3(22), p<0.0004). But in the case of cerebral hemorrhage, there was not statistical significance. Conclusions : The elapsed time of stroke may be an essential requisite in catching symptoms and prescribing for stroke patients in oriental medical treatment.

Two-sample chi-square test for randomly censored data (임의로 관측중단된 두 표본 자료에 대한 카이제곱 검정방법)

  • 김주한;김정란
    • The Korean Journal of Applied Statistics
    • /
    • v.8 no.2
    • /
    • pp.109-119
    • /
    • 1995
  • A two sample chi-square test is introduced for testing the equality of the distributions of two populations when observations are subject to random censorship. The statistic is appropriate in testing problems where a two-sided alternative is of interest. Under the null hypothesis, the asymptotic distribution of the statistic is a chi-square distribution. We obtain two types of chi-square statistics ; one as a nonnegative definite quadratic form in difference of observed cell probabilities based on the product-limit estimators, the other one as a summation form. Data pertaining to a cancer chemotheray experiment are examined with these statistics.

  • PDF

The Role of Negative Binomial Sampling In Determining the Distribution of Minimum Chi-Square

  • Hamdy H.I.;Bentil Daniel E.;Son M.S.
    • International Journal of Contents
    • /
    • v.3 no.1
    • /
    • pp.1-8
    • /
    • 2007
  • The distributions of the minimum correlated F-variable arises in many applied statistical problems including simultaneous analysis of variance (SANOVA), equality of variance, selection and ranking populations, and reliability analysis. In this paper, negative binomial sampling technique is employed to derive the distributions of the minimum of chi-square variables and hence the distributions of the minimum correlated F-variables. The work presented in this paper is divided in two parts. The first part is devoted to develop some combinatorial identities arised from the negative binomial sampling. These identities are constructed and justified to serve important purpose, when we deal with these distributions or their characteristics. Other important results including cumulants and moments of these distributions are also given in somewhat simple forms. Second, the distributions of minimum, chisquare variable and hence the distribution of the minimum correlated F-variables are then derived within the negative binomial sampling framework. Although, multinomial theory applied to order statistics and standard transformation techniques can be used to derive these distributions, the negative binomial sampling approach provides more information regarding the nature of the relationship between the sampling vehicle and the probability distributions of these functions of chi-square variables. We also provide an algorithm to compute the percentage points of the distributions. The computation methods we adopted are exact and no interpolations are involved.

Fucntional Prediction Method for Proteins by using Modified Chi-square Measure (보완된 카이-제곱 기법을 이용한 단백질 기능 예측 기법)

  • Kang, Tae-Ho;Yoo, Jae-Soo;Kim, Hak-Yong
    • The Journal of the Korea Contents Association
    • /
    • v.9 no.5
    • /
    • pp.332-336
    • /
    • 2009
  • Functional prediction of unannotated proteins is one of the most important tasks in yeast genomics. Analysis of a protein-protein interaction network leads to a better understanding of the functions of unannotated proteins. A number of researches have been performed for the functional prediction of unannotated proteins from a protein-protein interaction network. A chi-square method is one of the existing methods for the functional prediction of unannotated proteins from a protein-protein interaction network. But, the method does not consider the topology of network. In this paper, we propose a novel method that is able to predict specific molecular functions for unannotated proteins from a protein-protein interaction network. To do this, we investigated all protein interaction DBs of yeast in the public sites such as MIPS, DIP, and SGD. For the prediction of unannotated proteins, we employed a modified chi-square measure based on neighborhood counting and we assess the prediction accuracy of protein function from a protein-protein interaction network.

Comparative Analysis of Unweighted Sample Design and Complex Sample Design Related to the Exploration of Potential Risk Factors of Dysphonia (잠재적 위험요인의 탐색에 관한 단일표본분석과 복합표본분석의 비교)

  • Byeon, Hae-Won
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.13 no.5
    • /
    • pp.2251-2258
    • /
    • 2012
  • This study compared the unweighted sample design, frequency weighted sample design and complex sample design to using 2009 Korea National Health and Nutrition Examination Survey in an effort to identify whether or not there is any difference in potential risk factors. Pearson chi-square test and Rao-scott chi-square test were applied to the analytic methods. As a result of analyses, all the variables were overestimated as significant risk factors in case of the unweighted sample design to which only the frequency weights were applied. In addition, there were differences in the confidence levels and results from the simple random sampling analysis and complex sample design to which no weight was applied. It is necessary to carry out the complex sample design rather than the analysis to which the frequency weights are applied, in order to ensure the findings to represent the whole population when our national statistics data is used.

Properties of chi-square statistic and information gain for feature selection of imbalanced text data (불균형 텍스트 데이터의 변수 선택에 있어서의 카이제곱통계량과 정보이득의 특징)

  • Mun, Hye In;Son, Won
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.4
    • /
    • pp.469-484
    • /
    • 2022
  • Since a large text corpus contains hundred-thousand unique words, text data is one of the typical large-dimensional data. Therefore, various feature selection methods have been proposed for dimension reduction. Feature selection methods can improve the prediction accuracy. In addition, with reduced data size, computational efficiency also can be achieved. The chi-square statistic and the information gain are two of the most popular measures for identifying interesting terms from text data. In this paper, we investigate the theoretical properties of the chi-square statistic and the information gain. We show that the two filtering metrics share theoretical properties such as non-negativity and convexity. However, they are different from each other in the sense that the information gain is prone to select more negative features than the chi-square statistic in imbalanced text data.