• Title/Summary/Keyword: 카이제곱

Search Result 425, Processing Time 0.035 seconds

Issue Word Extraction Using Chi-square Statistics (카이제곱 통계량을 이용한 이슈 단어 추출)

  • Shin, Junsoo
    • Annual Conference on Human and Language Technology
    • /
    • 2014.10a
    • /
    • pp.225-227
    • /
    • 2014
  • 최근 온라인 뉴스는 대중의 관심사 및 트렌드에 따라서 다양한 종류의 기사들이 작성된다. 이러한 관심사 및 트렌드는 시간의 흐름에 따라 계속 변한다. 본 논문에서는 온라인 뉴스의 기사 제목을 이용하여 시간에 따라 변하는 관심사 및 트렌드와 관련된 단어를 추출하는 방법을 제안한다. 특정 기간 별 출현하는 뉴스들을 하나의 카테고리로 가정하고 자질 선택 방법에서 널리 사용되는 카이제곱 통계량을 이용하여 각 카테고리의 주요 단어를 추출한다. 실험 결과 특정 기간 별 관심사 및 트렌드와 관련된 단어들이 출현하는 것을 확인하였다.

  • PDF

A Monte Carlo Comparison of the Small Sample Behavior of Disparity Measures

  • Hong, Jong-Seon;Jeong, Dong-Bin;Park, Yong-Seok
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2003.05a
    • /
    • pp.149-150
    • /
    • 2003
  • 소표본 분할표 자료에서 적합도 검정통계량들의 카이제곱 근사 적용 가능에 대하여 많은 연구가 진행되었다. 소표본에서 세 가지 검정 통계량(피어슨 카이제곱 $X^{2}$, 일반화 가능도비 $G^{2}$, 그리고 역발산 I(2/3) 검정통계량)에 관하여 비교한 Rudas(1986)의 연구를 확장하여, 최근에 제안된 차이측도(BWHD(1/9), BWCS(1/3), NED(4/3) 검정통계량)를 포함시켜 비교 분석하였다. 독립모형의 이차원 분할표, 조건부 독립모형과 한 변수 독립 모형을 따르는 삼차원 분할표에 대한 모의실험을 통하여 생성된 90과 95 백분위수와 이에 대응하는 95% 신뢰구간을 살펴보고 실제 백분위수와 비교하였다. 그 결과 $X^{2}$, I(2/3), 그리고 BWHD(1/9) 검정통계량이 유사한 결과를 나타내었고 이 통계량들이 기존에 제안된 검정통계량들보다 적은 표본크기에서도 카이제곱 근사방법에 적용 가능함을 발견하였다.

  • PDF

On the Small Sample Distribution and its Consistency with the Large Sample Distribution of the Chi-Squared Test Statistic for a Two-Way Contigency Table with Fixed Margins (주변값이 주어진 이원분할표에 대한 카이제곱 검정통계량의 소표본 분포 및 대표본 분포와의 일치성 연구)

  • Park, Cheol-Yong;Choi, Jae-Sung;Kim, Yong-Gon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.11 no.1
    • /
    • pp.83-90
    • /
    • 2000
  • The chi-squared test statistic is usually employed for testing independence of two categorical variables in a two-way contingency table. It is well known that, under independence, the test statistic has an asymptotic chi-squared distribution under multinomial or product-multinomial models. For the case where both margins fixed, the sampling model of the contingency table is a multiple hypergeometric distribution and the chi-squared test statistic follows the same limiting distribution. In this paper, we study the difference between the small sample and large sample distributions of the chi-squared test statistic for the case with fixed margins. For a few small sample cases, the exact small sample distribution of the test statistic is directly computed. For a few large sample sizes, the small sample distribution of the statistic is generated via a Monte Carlo algorithm, and then is compared with the large sample distribution via chi-squared probability plots and Kolmogorov-Smirnov tests.

  • PDF

The Study for NHPP Software Reliability Model based on Chi-Square Distribution (카이제곱 NHPP에 의한 소프트웨어 신뢰성 모형에 관한 연구)

  • Kim, Hee-Cheul
    • Journal of the Korea Society of Computer and Information
    • /
    • v.11 no.1 s.39
    • /
    • pp.45-53
    • /
    • 2006
  • Finite failure NHPP models presented in the literature exhibit either constant, monotonic increasing or monotonic decreasing failure occurrence rates per fault. In this paper, Goel-Okumoto and Yamada-Ohba-Osaki model was reviewed, proposes the $x^2$ reliability model, which can capture the increasing nature of the failure occurrence rate per fault. Algorithm to estimate the parameters used to maximum likelihood estimator and bisection method, model selection based on SSE, AIC statistics and Kolmogorov distance, for the sake of efficient model, was employed. Analysis of failure using real data set, SYS2(Allen P.Nikora and Michael R.Lyu), for the sake of proposing shape parameter of the $x^2$ distribution using the degree of freedom, was employed. This analysis of failure data compared with the $x^2$ model and the existing model using arithmetic and Laplace trend tests, Kolmogorov test is presented.

  • PDF

On the behavior od Winsorized $x^2$ (윈저화 $x^2$의 양태에 대하여)

  • 성내경
    • The Korean Journal of Applied Statistics
    • /
    • v.7 no.2
    • /
    • pp.1-7
    • /
    • 1994
  • Using a Monte-Carlo simulation technique we evaluate the empiricla distribution of a pseudo-chi-square statistic based on symmetrically Winsorized sum of squares when the population is normally distributed, and search for a chi-square distribution with appropriate degrees of freedom which can be referred to an approximate distribution for Winsorized chi-square.

  • PDF

Two-sample chi-square test for randomly censored data (임의로 관측중단된 두 표본 자료에 대한 카이제곱 검정방법)

  • 김주한;김정란
    • The Korean Journal of Applied Statistics
    • /
    • v.8 no.2
    • /
    • pp.109-119
    • /
    • 1995
  • A two sample chi-square test is introduced for testing the equality of the distributions of two populations when observations are subject to random censorship. The statistic is appropriate in testing problems where a two-sided alternative is of interest. Under the null hypothesis, the asymptotic distribution of the statistic is a chi-square distribution. We obtain two types of chi-square statistics ; one as a nonnegative definite quadratic form in difference of observed cell probabilities based on the product-limit estimators, the other one as a summation form. Data pertaining to a cancer chemotheray experiment are examined with these statistics.

  • PDF

Spam Filter by Using X2 Statistics and Support Vector Machines (카이제곱 통계량과 지지벡터기계를 이용한 스팸메일 필터)

  • Lee, Song-Wook
    • The KIPS Transactions:PartB
    • /
    • v.17B no.3
    • /
    • pp.249-254
    • /
    • 2010
  • We propose an automatic spam filter for e-mail data using Support Vector Machines(SVM). We use a lexical form of a word and its part of speech(POS) tags as features and select features by chi square statistics. We represent each feature by TF(text frequency), TF-IDF, and binary weight for experiments. After training SVM with the selected features, SVM classifies each e-mail as spam or not. In experiment, the selected features improve the performance of our system and we acquired overall 98.9% of accuracy with TREC05-p1 spam corpus.

A Monte Carlo Comparison of the Small Sample Behavior of Disparity Measures (소표본에서 차이측도 통계량의 비교연구)

  • 홍종선;정동빈;박용석
    • The Korean Journal of Applied Statistics
    • /
    • v.16 no.2
    • /
    • pp.455-467
    • /
    • 2003
  • There has been a long debate on the applicability of the chi-square approximation to statistics based on small sample size. Extending comparison results among Pearson chi-square Χ$^2$, generalized likelihood .ratio G$^2$, and the power divergence Ι(2/3) statistics suggested by Rudas(1986), recently developed disparity statistics (BWHD(1/9), BWCS(1/3), NED(4/3)) we compared and analyzed in this paper. By Monte Carlo studies about the independence model of two dimension contingency tables, the conditional model and one variable independence model of three dimensional tables, simulated 90 and 95 percentage points and approximate 95% confidence intervals for the true percentage points are obtained. It is found that the Χ$^2$, Ι(2/3), BWHD(1/9) test statistics have very similar behavior and there seem to be applcable for small sample sizes than others.

Fucntional Prediction Method for Proteins by using Modified Chi-square Measure (보완된 카이-제곱 기법을 이용한 단백질 기능 예측 기법)

  • Kang, Tae-Ho;Yoo, Jae-Soo;Kim, Hak-Yong
    • The Journal of the Korea Contents Association
    • /
    • v.9 no.5
    • /
    • pp.332-336
    • /
    • 2009
  • Functional prediction of unannotated proteins is one of the most important tasks in yeast genomics. Analysis of a protein-protein interaction network leads to a better understanding of the functions of unannotated proteins. A number of researches have been performed for the functional prediction of unannotated proteins from a protein-protein interaction network. A chi-square method is one of the existing methods for the functional prediction of unannotated proteins from a protein-protein interaction network. But, the method does not consider the topology of network. In this paper, we propose a novel method that is able to predict specific molecular functions for unannotated proteins from a protein-protein interaction network. To do this, we investigated all protein interaction DBs of yeast in the public sites such as MIPS, DIP, and SGD. For the prediction of unannotated proteins, we employed a modified chi-square measure based on neighborhood counting and we assess the prediction accuracy of protein function from a protein-protein interaction network.

Text Categorization Features Automatic Extraction Method Using Chi-squared Statistic (카이제곱 통계량을 이용한 문서분류 자질 자동추출 방법)

  • Park, Jong-Hyun;Park, So-Young;Chang, Ju-No;Kihl, Tae-Suk
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2010.10a
    • /
    • pp.695-697
    • /
    • 2010
  • 문서에 포함되는 어휘는 문서 분류의 정보를 가지므로 문서를 분석하여 유용한 단어를 추출하는 것은 다양한 서비스와 연계되어 사용될 수 있어 매우 유용한 일이다. 문서 자동 분류에서는 분류자질 선정 방식에 따라 분류정확도가 서로 달라질 수 있으며, 문서에서 추출되는 유용한 단어에 따라 인지되는 분야가 달라질 수 있다. 이에 본 논문에서는 각 문서에 포함되는 단어에 대한 카이제곱 통계량 점수를 사용하여 단어별 문서 분류에 대한 단어의 자질을 평가하고 문서의 분류별 유용한 단어를 자동 추출하는 방법을 제안하고 개발한다.

  • PDF