• Title/Summary/Keyword: 통계 오류

Search Result 384, Processing Time 0.028 seconds

An Improved Bayesian Spam Mail Filter based on Ch-square Statistics (카이제곱 통계량을 이용한 개선된 베이지안 스팸메일 필터)

  • Kim Jin-Sang;Choe Sang-Yeol
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2005.04a
    • /
    • pp.403-414
    • /
    • 2005
  • Most of the currently used spam-filters are based on a Bayesian classification technique, where some serious problems occur such as a limited precision/recall rate and the false positive error. This paper addresses a solution to the problems using a modified Bayesian classifier based on chi-square statistics. The resulting spam-filter is more accurate and flexible than traditional Bayesian spam-filters and can be a personalized one providing some parameters when the filter is teamed from training data.

  • PDF

Gamma Mixed Model to Improve Sib-Pair Linkage Analysis (감마 혼합 모형을 통한 반복 측정된 형제 쌍 연관 분석 사례연구)

  • Kim, Jeonghwan;Suh, Young Ju;Won, Sungho;Nah, Jeung Weon;Lee, Woojoo
    • The Korean Journal of Applied Statistics
    • /
    • v.28 no.2
    • /
    • pp.221-230
    • /
    • 2015
  • Traditionally, sib-pair linkage analysis with repeated measures has employed linear mixed models, but it suffers from the lack of power to find genetic marker loci associated with a phenotype of interest. In this paper, we use a gamma mixed model to improve sib-pair linkage analysis and compare it with a linear mixed model in terms of power and Type I error. We illustrate that the use of gamma mixed model can achieve higher power than linear mixed model with Genetic Analysis Workshop 13 data.

A parametric bootstrap test for comparing differentially private histograms (모수적 부트스트랩을 이용한 차등정보보호 히스토그램의 동질성 검정)

  • Son, Juhee;Park, Min-Jeong;Jung, Sungkyu
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.1
    • /
    • pp.1-17
    • /
    • 2022
  • We propose a test of consistency for two differentially private histograms using parametric bootstrap. The test can be applied when the original raw histograms are not available but only the differentially private histograms and the privacy level α are available. We also extend the test for the case where the privacy levels are different for different histograms. The resident population data of Korea and U.S in year 2020 are used to demonstrate the efficacy of the proposed test procedure. The proposed test controls the type I error rate at the nominal level and has a high power, while a conventional test procedure fails. While the differential privacy framework formally controls the risk of privacy leakage, the utility of such framework is questionable. This work also suggests that the power of a carefully designed test may be a viable measure of utility.

High-dimensional change point detection using MOSUM-based sparse projection (MOSUM 성근 프로젝션을 이용한 고차원 시계열의 변화점 추정)

  • Kim, Moonjung;Baek, Changryong
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.1
    • /
    • pp.63-75
    • /
    • 2022
  • This paper proposes the so-called MOSUM-based sparse projection method for change points detection in high-dimensional time series. Our method is inspired by Wang and Samworth (2018), however, our method improves their method in two ways. One is to find change points all at once, so it minimizes sequential error. The other is localized so that more robust to the mean changes offsetting each other. We also propose data-driven threshold selection using block wild bootstrap. A comprehensive simulation study shows that our method performs reasonably well in finite samples. We also illustrate our method to stock prices consisting of S&P 500 index, and found four change points in recent 6 years.

Korean Polysemy Word-Sense-Disambiguation using MoDu-Corpus (모두의 말뭉치를 이용한 한국어 다의어 분별)

  • Shin, Joon-Choul;Lee, Ju-Sang;Ock, Cheol-Young
    • Annual Conference on Human and Language Technology
    • /
    • 2020.10a
    • /
    • pp.205-210
    • /
    • 2020
  • 한국어 자연어처리 분야가 발달하면서 동형이의어 분별을 한 단계 넘어선 다의어 분별의 중요성이 점점 상승하고 있다. 최근에 다의어가 태깅된 "모두의 말뭉치"가 발표되었고, 이 말뭉치는 다의어가 태깅된 최초의 공개 말뭉치로써 다의어 연구가 본격적으로 진행될 수 있음을 의미한다. 본 논문에서는 이 말뭉치를 학습하여 작동하는 다의어 분별의 초기 모델을 제시하며, 이 모델의 실험 결과는 차후 연구를 위한 비교 기준점이 될 수 있다. 이 모델은 딥러닝을 사용하지 않은 통계형으로 개발되었고, 형태소분석과 동형이의어 분별은 기존의 UTagger로 해결하고 말뭉치 자원 외에도 UWordMap을 사용하여 다의어 분별을 보조하였다. 이 모델의 정확률은 약 87%이며, 다의어 분별 전에 형태소분석 또는 동형이의어 분별 단계에서 오류가 난 것을 포함한다. 현재까지 공개된 이 말뭉치는 오직 명사만 다의어 주석이 있기 때문에 명사만 정확률 측정 대상이 되었다. 이 연구를 통하여 다의어 분별의 어려움과, 다의어 분별에는 동형이의어 분별과는 다른 방법이 필요하다는 것을 확인할 수 있었다.

  • PDF

The Relationships among Students' Mapping Understanding, Mapping Errors and Cognitive/Affective Variables in Learning with Analogy (비유를 사용한 수업에서 학생들의 인지적.정의적 특성과 대응 이해 및 대응 오류 유형과의 관계)

  • Kim, Kyung-Sun;Hwang, Sun-Young;Noh, Tae-Hee
    • Journal of the Korean Chemical Society
    • /
    • v.54 no.1
    • /
    • pp.150-157
    • /
    • 2010
  • In this study, we investigated the differences of mapping understanding and the types of mapping errors by the levels of students' cognitive/affective variables and the relationships between mapping understanding and these variables in learning 'concentration and reaction rate' with analogy. After administering the tests regarding logical thinking ability, visual imagery ability, analogical reasoning ability, self efficacy, and need for cognition as pretests, students learned with analogy. Then, students' familiarity and mapping understanding were examined. Analyses of the results revealed that the scores of the mapping understanding for the students with higher levels of all cognitive/affective variables except visual imagery ability and familiarity were significantly higher than those for the students with lower levels. The differences in the types of the mapping errors such as overmapping, failure to map, impossible mapping, artificial mapping, mismapping, rash mapping, and retention of a base feature were also found by the levels of students' cognitive and affective variables. The scores of students' mapping understanding were positively correlated with those of all cognitive and affective variables. The results of multiple regression analysis indicated that students' science achievement, logical thinking ability, and familiarity were significant predictors of mapping understanding. Educational implications of these findings are discussed.

Is the t-test insensitive than the bootstrap method in the P300-based concealed information test? (P300 숨긴정보검사에서 t 검증이 부트스트랩 방법보다 덜 민감한가?)

  • Eom, Jin-sup;Sohn, Jin-Hun;Park, Mi-Sook
    • Korean Journal of Forensic Psychology
    • /
    • v.11 no.1
    • /
    • pp.21-36
    • /
    • 2020
  • In P300-based concealed information test (P300 CIT), it evaluates whether the P300 amplitude for the probe is significantly greater than that of the irrelevant to determine if the suspect is telling a lie. An independent sample t-test or a bootstrap method can be used as a statistical test to make that decision. Rosenfeld et al. (2004) used the bootstrap method, claiming that "t tests on single sweeps are too insensitive to use to compare mean probe and irrelevant P300s within individuals" and their method has been accepted to date. The purpose of the study is to evaluate whether the power of t-test is lower than that of the bootstrap method in the P300 CIT. The Monte Carlo study was conducted by using EEG collected from 39 participants. The results showed that the type I error rates of the t-test and the percentile bootstrap method were similar and the power of the percentile bootstrap method was slightly higher than that of the t-test. The type I error rates of the t-test and the percentile bootstrap method were slightly lower than the significance level and the powers of the two tests were also slightly lower than that of the theoretical t-test. On the other hand, the type I error rate and power of the standard error Bootstrap method were the same as those of the theoretical t-test and its power was .012 ~ .081 higher than that of t-test depending on experimental conditions.

  • PDF

Effect on Turnover Intention in Hotel Employees with Musculoskeletal Pains by Working Environment (호텔종사원의 근무환경에 따른 근골격계 통증이 이직의도에 미치는 영향)

  • Kim, Seong-Yeol
    • The Journal of the Korea Contents Association
    • /
    • v.12 no.8
    • /
    • pp.256-263
    • /
    • 2012
  • The purpose of this study is to investigate how hotel employees' musculoskeletal pain, which is likely caused by their poor working environment, affects their intent on changing jobs. The participants of this study were 200 hotel employees who were experiencing musculoskeletal pain. They were asked four questions through face-to-face interviews and questionnaires. They were asked four questions through face-to-face interviews and questionnaires. These four questions were about musculoskeletal pain, the employees' working environment, their turnover intention, and their general personalities. This study finds that the musculoskeletal pain is related to the employee's turnover intention. Based on its findings, this study claims that an adequate working environment and various preventative programs are necessary to decrease the number of employees resigning and to prevent musculoskeletal pain.

Comparison of the Family Based Association Test and Sib Transmission Disequilibrium Test for Dichotomous Trait (이산형 형질에 대한 가족자료 연관성 검정법 FBAT와 형제 전달 불균형 연관성 검정법 S-TDT의 비교)

  • Kim, Han-Sang;Oh, Young-Sin;Song, Hae-Hiang
    • The Korean Journal of Applied Statistics
    • /
    • v.23 no.6
    • /
    • pp.1103-1113
    • /
    • 2010
  • An extensively used approach for family based association test(FBAT) is compared with the sib transmission/disequilibrium test(S-TDT), and in particular the adjusted S-TDT, in which the covariance among related siblings is taken into consideration, can provide a more sensitive test statistic for association. A simulation study comparing the three test statistics demonstrates that the type I error rates of all three tests are larger than the prespecified significance level and the power of the FBAT is lower than those of the other two tests. More detailed studies are required in order to assess the influence of the assumed conditions in FBAT on the efficiency of the test.

Automatic Product Feature Extraction for Efficient Analysis of Product Reviews Using Term Statistics (효율적인 상품평 분석을 위한 어휘 통계 정보 기반 평가 항목 추출 시스템)

  • Lee, Woo-Chul;Lee, Hyun-Ah;Lee, Kong-Joo
    • The KIPS Transactions:PartB
    • /
    • v.16B no.6
    • /
    • pp.497-502
    • /
    • 2009
  • In this paper, we introduce an automatic product feature extracting system that improves the efficiency of product review analysis. Our system consists of 2 parts: a review collection and correction part and a product feature extraction part. The former part collects reviews from internet shopping malls and revises spoken style or ungrammatical sentences. In the latter part, product features that mean items that can be used as evaluation criteria like 'size' and 'style' for a skirt are automatically extracted by utilizing term statistics in reviews and web documents on the Internet. We choose nouns in reviews as candidates for product features, and calculate degree of association between candidate nouns and products by combining inner association degree and outer association degree. Inner association degree is calculated from noun frequency in reviews and outer association degree is calculated from co-occurrence frequency of a candidate noun and a product name in web documents. In evaluation results, our extraction method showed an average recall of 90%, which is better than the results of previous approaches.