DOI QR코드

DOI QR Code

Review on Problems with Null Hypothesis Significance Testing in Dental Research and Its Alternatives

치의학 연구에서 귀무가설 유의성 검정의 문제점과 대안에 관한 고찰

  • Lee, Kwang-Hee (Department of Pediatric Dentistry, College of Dentistry, Wonkwang University)
  • 이광희 (원광대학교 치과대학 소아치과학교실)
  • Received : 2013.07.10
  • Accepted : 2013.07.24
  • Published : 2013.08.30

Abstract

There are many problems in evaluating study results by p value in null hypothesis testing for dental research. It is a logical fallacy to conclude that the null hypothesis is true when the it is not rejected. There are much serious misunderstanding about p value, and researchers should be cautious about interpreting p value in writing papers. As alternatives to complement or replace the null hypothesis significance testing, effect size, confidence interval, and Bayesian statistics are introduced.

치의학 연구에서 사용되는 귀무가설 유의성 검정에서 p값을 기준으로 연구의 결과를 평가하는 것은 많은 문제점을 내포하고 있다. 귀무가설이 기각되지 않은 경우에 귀무가설이 옳다는 결론을 내리는 것은 논리적 오류이다. p값에 대한 중대한 오해가 많이 있으며 연구자는 논문을 작성할 때 p값의 해석에 신중해야 한다. 귀무가설검정을 보완하거나 대체할 수 있는 대안으로서, 효과 크기, 신뢰구간, 베이지안 통계 등이 있다.

Keywords

References

  1. Seaman JE, Allen IE : Not significant, but Important? Know the pitfalls of p-values and formal hypothesis tests. Quality Progress, 2011 August. Available from URL : http://asq.org/quality-progress/2011/08/statistics-roundtable/not-significant-butimportant.html (Accessed on July 8, 2013)
  2. Matrixx Initiatives, Inc. v. Siracusano. Available from URL: http://en.wikipedia.org/wiki/Matrixx_Initiatives,_Inc._v._Siracusano (Accessed on July 8, 2013)
  3. Meehl PE : Theory-testing in psychology and physics: a methodological paradox. Philosophy Sci, 34:103-115, 1967. https://doi.org/10.1086/288135
  4. Meehl PE : Theoretical risks and tabular asterisks: sir Karl, sir Ronald, and the slow progress of soft psychology. J Consult Clin Psychol, 46:806-834, 1978. https://doi.org/10.1037/0022-006X.46.4.806
  5. Cohen J : The earth is round (p<.05). Am Psychol, 49:997-1003, 1994. https://doi.org/10.1037/0003-066X.49.12.997
  6. Schmidt FL, Hunter JE : Eight common but false objections to the discontinuation of significance testing in the analysis of research data. In Harlow LA, Mulaik SA, Steiger JH (eds.) : What if there were no significance tests? Mahwah, NJ, Lawrence Erlbaum Associates, 37-64, 1997.
  7. NHST problems. Available from URL: http://www.faculty.biol.ttu.edu/strauss/stats/LectureNotes/20_NHSTProblems.pdf (Accessed on July 8, 2013)
  8. Fallacy of affirming the consequent. Available from URL: http://terms.naver.com/entry.nhn?cid=1137&docId=275047&mobile&categoryId=1137 (Accessed on July 8, 2013)
  9. Pollard P, Richardson JTE : On the probability of making type I errors. Psychol Bull, 102:159-163, 1987. https://doi.org/10.1037/0033-2909.102.1.159
  10. Reese HW : Problems of statistical inference. Mex J Behav Anal, 25:39-68, 1999.
  11. Goodman S : A dirty dozen: twelve p-value misconceptions. Semin Hematol, 45:135-140, 2008. https://doi.org/10.1053/j.seminhematol.2008.04.003
  12. Hubbard R, Lindsay RM : Why p values are not a useful measure of evidence in statistical significance sesting. Theory Psychol, 18:69-88, 2008. https://doi.org/10.1177/0959354307086923
  13. Sterne JAC, Smith GD : Sifting the evidence - what's wrong with significance tests? BMJ(Clin res), 322:226-231, 2001.
  14. Johnson, DH : The insignificance of statistical significance testing. J Wildlife Manag, 63:763-772, 1999. https://doi.org/10.2307/3802789
  15. Nurminen M : Statistical significance - a misconstrued notion in medical research. Scand J Work Environ Health, 23:232-235, 1997. https://doi.org/10.5271/sjweh.204
  16. Schervish MJ : P values: what they are and what they are not. Am Stat, 50:203-206, 1996.
  17. Carver RP : The case against statistical significance testing. Harvard Educat Review, 48:378-399, 1978. https://doi.org/10.17763/haer.48.3.t490261645281841
  18. Nickerson RS : Null hypothesis statistical testing: a review of an old and continuing controversy. Psychol Methods, 5:241-301, 2000. https://doi.org/10.1037/1082-989X.5.2.241
  19. Berger JO, Sellke T : Testing a point null hypothesis: the irreconcilability of p values and evidence (with comments). J Am Stat Assoc, 82:112-139, 1987.
  20. Berger JO, Delampady M : Testing precise hypotheses (with comments). Stat Science, 2:317-352, 1987. https://doi.org/10.1214/ss/1177013238
  21. Nester MR : An applied statistician’s creed. Statistician, 45:401-410, 1996.
  22. Berger JO, Berry DA : Statistical analysis and the illusion of objectivity. Am Scientist, 76:159-165, 1988.
  23. Hubbard, R : Alphabet soup: blurring the distinctions between p's and ${\alpha}$'s in psychological research. Theory Psychol, 14:295-327, 2004. https://doi.org/10.1177/0959354304043638
  24. Sellke T, Bayarri MJ, Berger JO : Calibration of p values for testing precise null hypotheses. Am Statistician, 55:62-71, 2001. https://doi.org/10.1198/000313001300339950
  25. Gelman A, Stern H : The difference between 'significant' and 'not significant' is not itself statistically significant. Am Statistician, 60:328-331, 2006. https://doi.org/10.1198/000313006X152649
  26. International committee of medical journal editors : Uniform requirements for manuscripts submitted to biomedical journals. Available from URL: http://www.icmje.org/manuscript_1prepare.html (Assessed on June 27, 2013)
  27. Royall RM : The effect of sample size on the meaning of significance tests. Am Statistician, 40:313-315, 1986.
  28. Hand DJ : Data mining: statistics and more? Am Statistician, 52:112.118, 1998.
  29. Fisher RA : The design of experiments (8th ed.). Edinburgh, Oliver & Boyd, 1966.
  30. Fisher BJ : R.A. Fisher: The life of a scientist. New York, Wiley, 1978.
  31. Denis DJ : Alternatives to null hypothesis significance testing. Theory & Science, 4(1), 2003. Available from URL: http://theoryandscience.icaap.org/content/vol4.1/02_denis.html (Accessed on July 8, 2013)
  32. Rosenthal R : Effect size estimation, significance testing, and the file-drawer problem. J Parapsychol, 56:57-58, 1992.
  33. Vaughan GM, Corballis MC : Beyond tests of significance: Estimating strength of effects in selected ANOVA designs. Psychol Bulletin, 72:204-213, 1969. https://doi.org/10.1037/h0027878
  34. Silva-Aycaguer LC, Suarez-Gil P, Fernandez-Somoano A : Null hypothesis significance test in health sciences research (1995-2006): statistical analysis and interpretation. BMC Med Res Methodol, 10:44, 2010. https://doi.org/10.1186/1471-2288-10-44
  35. Schmidt FL : Statistical significance testing and cumulative knowledge in psychology: implications for training of researchers. Psychol Methods, 1:115-129, 1996. https://doi.org/10.1037/1082-989X.1.2.115
  36. Cumming G, Finch S : Inference by eye: confidence intervals and how to read pictures of data. Am Psychol, 60:170-180, 2005. https://doi.org/10.1037/0003-066X.60.2.170
  37. Schenker N, Gentleman JF : On judging the significance of differences by examining the overlap between confidence intervals. Am Statistician, 55:182-186, 2001. https://doi.org/10.1198/000313001317097960
  38. Wang S, Campbell B : Mr. Bayes goes to Washington. Science, 339:758-759, 2013. https://doi.org/10.1126/science.1232290
  39. Efron B : Bayes’Theorem in the twenty-first century. Science, 340:1177-1178, 2013. https://doi.org/10.1126/science.1236536
  40. FDA : Guidance for the use of Bayesian statistics in medical device clinical trials. Available from URL : http://www.fda.gov/medicaldevices/deviceregulationandguidance/guidancedocuments/ucm071072.htm (Accessed on July 8, 2013)
  41. Lilford RJ, Braunholtz D : The statistical basis of public policy: a paradigm shift is overdue. Br Med J, 313:603-607, 1996. https://doi.org/10.1136/bmj.313.7057.603
  42. Efron B : Why isn’t everyone a Bayesian (with discussion)? Am Statist, 40:1-11, 1986.
  43. Nurminen M, Mutanen P : Exact Bayesian analysis of two proportions. Scand J Stat, 14:67-77, 1987.
  44. Diaconis P, Freedman D : On the consistency of Bayes estimate (with discussion). Ann Math Stat, 14:1-67, 1986. https://doi.org/10.1214/aos/1176349830
  45. Zhang Y, Todem D, Kim K, Lesaffre E : Bayesian latent variable models for spatially correlated toothlevel binary data in caries research. Stat Modelling, 11:25-47, 2011. https://doi.org/10.1177/1471082X1001100103
  46. Tu YK, Needleman I, Chambrone L, et al. : A Bayesian network meta-analysis on comparisons of enamel matrix derivatives, guided tissue regeneration and their combination therapies. J Clin Periodontol, 39:303-314, 2012. https://doi.org/10.1111/j.1600-051X.2011.01844.x
  47. Frosio I, Olivieri C, Lucchese M, et al. : Bayesian denoising in digital radiography: a comparison in the dental field. Comput Med Imaging Graph, 37:28-39, 2013. https://doi.org/10.1016/j.compmedimag.2012.10.003
  48. Freedman L : Bayesian statistical methods. A natural way to assess clinical evidence (editorial). Br Med J, 313:569-570, 1996. https://doi.org/10.1136/bmj.313.7057.569