• Title/Summary/Keyword: 카이제곱

Search Result 434, Processing Time 0.025 seconds

A Spam Message Filter System for Mobile Environment (휴대폰의 스팸문자메시지 판별 시스템)

  • Lee, Songwook
    • Annual Conference on Human and Language Technology
    • /
    • 2010.10a
    • /
    • pp.194-196
    • /
    • 2010
  • 휴대폰의 광범위한 보급으로 문자메시지의 사용이 급증하고 있다. 이와 동시에 사용자가 원하지 않는 광고성 스팸문자도 넘쳐나고 있다. 본 연구는 이러한 스팸문자메시지를 자동으로 판별하는 시스템을 개발하는 것이다. 우리는 기계학습방법인 지지벡터기계(Support Vector Machine)을 사용하여 시스템을 학습하였으며 자질의 선택은 카이제곱 통계량을 이용하였다. 실험결과 F1 척도로 약 95.5%의 정확률을 얻었다

  • PDF

On a robust analysis of variance based on winsorization (윈저화를 이용한 로버스트 분산분석)

  • 성내경
    • The Korean Journal of Applied Statistics
    • /
    • v.8 no.1
    • /
    • pp.119-131
    • /
    • 1995
  • Based on Monte-Carlo simulation results we propose a robust analysis of variance procedure by utilizing trimmed mean and Winsorized variance. We deal with mainly the one-way classification case. We evaluate the empirical distribution of a pseudo-F statistic based on symmetrically Winsorized sum of squares when the population is normally distributed.

  • PDF

An Improved Bayesian Spam Mail Filter based on Ch-square Statistics (카이제곱 통계량을 이용한 개선된 베이지안 스팸메일 필터)

  • Kim Jin-Sang;Choe Sang-Yeol
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2005.04a
    • /
    • pp.403-414
    • /
    • 2005
  • Most of the currently used spam-filters are based on a Bayesian classification technique, where some serious problems occur such as a limited precision/recall rate and the false positive error. This paper addresses a solution to the problems using a modified Bayesian classifier based on chi-square statistics. The resulting spam-filter is more accurate and flexible than traditional Bayesian spam-filters and can be a personalized one providing some parameters when the filter is teamed from training data.

  • PDF

Chi-Squared Test of Independence in Case that Two Marginal Distributions are Given Exactly (모집단 부분정보가 주어진 상황에서의 분할표 독립성 검정)

  • 이광진
    • The Korean Journal of Applied Statistics
    • /
    • v.17 no.1
    • /
    • pp.89-103
    • /
    • 2004
  • If the given information is exact, though it is the little, we had better use it than not use in analysis. In this article, the problem of independence test in a contingency table is considered when two marginal distributions of a population are given exactly. For that case, a likelihood-ratio chi-squared test statistic and its Pearsonian type chi-squared test statistic are derived. By Monte Carlo Simulations the traditional chi-square tests and the derived tests are compared. And the related some testing problems are synthetically explained on a geometrical viewpoint.

비중심 카이제곱분포의 동결성검정

  • 황형태;오희정
    • Communications for Statistical Applications and Methods
    • /
    • v.5 no.1
    • /
    • pp.217-223
    • /
    • 1998
  • 공통의 자유도를 갖는 $textsc{k}$개의 비중심 카이제곱분포들의 동질성을 검정하기 위하여 우선 적당한 형태의 검정방법을 제시하였다. 통상적인 방법대로, 제시된 검정방법이 주어진 유의수준을 만족시키도록 하기 위해서는, 귀무가설하에서 제 1종의 오류의 확률을 최대화하는 모수의 최소 우호적 위치(Least favorable configuration)가 유도되었으며, 이에 따라서 주어진 유의수준을 충족하는 기각치를 도표화하였다.

  • PDF

Properties of chi-square statistic and information gain for feature selection of imbalanced text data (불균형 텍스트 데이터의 변수 선택에 있어서의 카이제곱통계량과 정보이득의 특징)

  • Mun, Hye In;Son, Won
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.4
    • /
    • pp.469-484
    • /
    • 2022
  • Since a large text corpus contains hundred-thousand unique words, text data is one of the typical large-dimensional data. Therefore, various feature selection methods have been proposed for dimension reduction. Feature selection methods can improve the prediction accuracy. In addition, with reduced data size, computational efficiency also can be achieved. The chi-square statistic and the information gain are two of the most popular measures for identifying interesting terms from text data. In this paper, we investigate the theoretical properties of the chi-square statistic and the information gain. We show that the two filtering metrics share theoretical properties such as non-negativity and convexity. However, they are different from each other in the sense that the information gain is prone to select more negative features than the chi-square statistic in imbalanced text data.

Modified Chi-square Method for Prediction of Unannotated Proteins from Protein Interaction Network (단백질 상호작용 네트워크에서 단백질 기능 예측을 위한 Modified Chi-square 기법)

  • Tae-Ho Kang;Jae-Soo Yoo
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2008.11a
    • /
    • pp.785-787
    • /
    • 2008
  • 생명체의 생명현상을 주관하는 각종 화학반응들은 단백질이 관여하고 있다. 단백질은 일정한 질서에 따라 서로 조립되기도 하고, 기능적으로 연관돼 네트워크를 이루고 있다. 이 네트워크를 구성하는 단백질-단백질 상호작용은 단백질의 기능과 밀접하게 관련되어 있다. 즉, 상호작용하는 단백질은 같은 기능을 수행할 가능성이 크다. 이러한 사실은 단백질-단백질 상호작용을 통해 기능이 알려지지 않은 미지 단백질의 기능을 예측할 수 있게 한다. 대표적인 연구로는 이웃 노드에 존재하는 기능분포를 이용하는 이웃노드 카운트(Neighborhood Counting)방식과 특정 기능의 나타날 빈도를 계산하여 기능을 예측하는 카이-제곱(Chi-Square)방식 등이 있다. 본 논문에서는 단백질 기능 예측의 정확성을 높이기 위해 이들 두 방식의 장점을 취합한 보완된 카이-제곱 방식을 제안한다. 그리고 다양한 단백질 상호작용 네트워크 데이터를 비교 분석하여 보완된 카이-제곱 방식이 기능 예측의 정확성이 높음을 증명한다.

A Comparative Study on the Infinite NHPP Software Reliability Model Following Chi-Square Distribution with Lifetime Distribution Dependent on Degrees of Freedom (수명분포가 자유도에 의존한 카이제곱분포를 따르는 무한고장 NHPP 소프트웨어 신뢰성 모형에 관한 비교연구)

  • Kim, Hee-Cheul;Kim, Jae-Wook
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.10 no.5
    • /
    • pp.372-379
    • /
    • 2017
  • Software reliability factor during the software development process is elementary. Case of the infinite failure NHPP for identifying software failure, the occurrence rates per fault (hazard function) have the characteristic point that is constant, increases and decreases. In this paper, we propose a reliability model using the chi - square distribution which depends on the degree of freedom that represents the application efficiency of software reliability. Algorithm to estimate the parameters used to the maximum likelihood estimator and bisection method, a model selection based on the mean square error (MSE) and coefficient of determination($R^2$), for the sake of the efficient model, were employed. For the reliability model using the proposed degree of freedom of the chi - square distribution, the failure analysis using the actual failure interval data was applied. Fault data analysis is compared with the intensity function using the degree of freedom of the chi - square distribution. For the insurance about the reliability of a data, the Laplace trend test was employed. In this study, the chi-square distribution model depends on the degree of freedom, is also efficient about reliability because have the coefficient of determination is 90% or more, in the ground of the basic model, can used as a applied model. From this paper, the software development designer must be applied life distribution by the applied basic knowledge of the software to confirm failure modes which may be applied.

Small sample tests for two-way contingency tables (2원 분할표의 소표본 검증법)

  • 허명회
    • The Korean Journal of Applied Statistics
    • /
    • v.10 no.2
    • /
    • pp.339-352
    • /
    • 1997
  • Chi-square test based on large sample theory is inappropriate for testing the row homogeneity in two-way contingency table with several sparse cells. For that case, exact testing methods has been developed in the literature and implemented in StatXact(1991). However, considerable computing time is inevitable for moderate size tables. So, Monte Carlo approximation is recommended frequently. In this study, we propose a simple algorithm for generating two-way random tables with fixed row and column margins for small sample chi-square test. Also, we develop “Turkey-type” method for multiple between-row comparisons.

  • PDF