• Title/Summary/Keyword: Kappa statistic

Search Result 33, Processing Time 0.026 seconds

A Study on Comparison of Generalized Kappa Statistics in Agreement Analysis

  • Kim, Min-Seon;Song, Ki-Jun;Nam, Chung-Mo;Jung, In-Kyung
    • The Korean Journal of Applied Statistics
    • /
    • v.25 no.5
    • /
    • pp.719-731
    • /
    • 2012
  • Agreement analysis is conducted to assess reliability among rating results performed repeatedly on the same subjects by one or more raters. The kappa statistic is commonly used when rating scales are categorical. The simple and weighted kappa statistics are used to measure the degree of agreement between two raters, and the generalized kappa statistics to measure the degree of agreement among more than two raters. In this paper, we compare the performance of four different generalized kappa statistics proposed by Fleiss (1971), Conger (1980), Randolph (2005), and Gwet (2008a). We also examine how sensitive each of four generalized kappa statistics can be to the marginal probability distribution as to whether marginal balancedness and/or homogeneity hold or not. The performance of the four methods is compared in terms of the relative bias and coverage rate through simulation studies in various scenarios with different numbers of raters, subjects, and categories. A real data example is also presented to illustrate the four methods.

Inter-rater Reliability Study on Pattern Identification Using Nasal Endoscopy for Rhinitis (비내시경 활용 비염 변증 지표의 평가자 간 신뢰도 연구)

  • Min, Kyung-Jin;Son, Mi-Ju;Kim, Young-Eun;Kim, Jeong-Hun;Lee, Dong-Hyo
    • The Journal of Korean Medicine Ophthalmology and Otolaryngology and Dermatology
    • /
    • v.30 no.4
    • /
    • pp.97-103
    • /
    • 2017
  • Objectives : To identify whether pattern identification using nasal endoscopy for rhinitis can be applied as a tool for evaluating rhinitis in routine care setting, we performed a inter-rater reliability study on this pattern identification. Methods : Two Korean medicine doctors assessed 290 left/right nasal endoscopy photograph cases of rhinitis patients with pattern identification using nasal endoscopy. This pattern identification consist of four assessment items, nasal membrane color(pale/hyperemia), nasal membrane humidity(dryness/dampness), rhinorrhea(watery/yellow), and turbinate membrane edema(atrophic/edematous). Cohen's kappa statistic and Percentage agreement were used to evaluate the inter-rater reliability. Results : Inter-rater percentage agreement and Kappa coefficient for left nasal endoscopy photograph cases was from 'slight' to 'moderate'(% agreement: 40.00-67.59%/Kappa: 0.06-0.407). Only the agreement of 'rhinorrhea (watery/yellow)' item was moderate(% agreement: 67.59%/Kappa: 0.407). Inter-rater percentage agreement and Kappa coefficient for right nasal endoscopy photograph cases was also from 'slight' to 'moderate'(% agreement: 42.41-68.97%/Kappa: 0.109-0.465). Only the agreement of 'rhinorrhea(watery/yellow)' item was moderate(% agreement: 68.97%/Kappa: 0.465). Conclusions : It is necessary to resolve problems such as cut-off value setting, bipolar evaluation values(pale/hyperemia, dryness/dampness, watery/yellow, atrophic/edematous) and weighting items. Further rigorous studies that overcome the limitations of the current research are warranted.

The Validity and Reliability of a Screening Questionnaire for Parkinson's Disease in a Community

  • Kim, Jong-Hun;Cheong, Hae-Kwan;Lee, Chong-Sik;Yi, Sung-Eun;Park, Kun-Woo
    • Journal of Preventive Medicine and Public Health
    • /
    • v.43 no.1
    • /
    • pp.9-17
    • /
    • 2010
  • Objectives: Parkinson's disease is one of the most common neurodegenerative diseases in the elderly population. In order to estimate the prevalence of Parkinson's disease in the community, the application of a good screening tool is essential. We evaluated the validity and reliability of a Parkinson's disease screening questionnaire and propose an alternative measure to improve its validity for use in community surveys. Methods: We designed the study in a three-phase approach consisting of a screening questionnaire, neurologic examination, and confirmatory examination. A repeated survey was administered to patients with disease detected in the community and on 150 subjects. We examined internal consistency using Cronbach's alpha test, test-retest reliability using the kappa statistic, and validity using sensitivity, specificity, and ROC curves. Unadjusted odds ratios were utilized for the estimation of weights for each questionnaire item. Results: The Cronbach's alpha of the questionnaire was 0.708. The kappa statistic for test-retest reliability was good to generally fair in most of the items. When newly proposed weighting scores were used, the optimum cut-off value was 7/8. When cut-off value was 5/6 for surveying prevalence in a community, the sensitivity was 0.98, and the specificity was 0.61, with simultaneous improvement in reliability. Conclusions: We recommend 5/6 as the ideal cut-off value for the survey of PD prevalence in community. This questionnaire designed for the Korean community could help future epidemiologic studies of PD.

Interobserver and Interaobserver Variability in Interpretation of Lumbar Disc Abnormalities on Magnetic Resonance Images (자기공명 촬영상 요추 추간반 병변의 판독자내 및 판독자간 해석의 다양성)

  • Jeon, Een-Ho;Song, Jun-Hyeok;Park, Hyang-Kwon;Shin, Kyu-Man;Kim, Sung-Hak;Park, Dong-Been
    • Journal of Korean Neurosurgical Society
    • /
    • v.30 no.sup2
    • /
    • pp.254-258
    • /
    • 2001
  • Objective : The terminology of degenerative disc disease lacks official standardization. Lacks of such standardization may provoke some clinical and litigation problems. The authors investigated interobserver and intraobserver variability in interpretation of lumbar disc abnormality. Methods : Magnetic resonance imaging studies of the lumbar spine performed prospectively in 50 patients, were read blindly by three doctors dealing spinal disorders, using two nomenclature. Nomenclature I was normal, bulging, protrusion, extrusion. Nomenclature II was normal, bulging, herniation without neural compression, with neural compression. Intraobserver and interobserver variation were measured statistically. Results : Interobserver agreement was 70.4-80.8% for nomenclature I, 76.2-80.2% for nomenclature II. Intraobserver agreement was 84.0-88.0% for nomenclature I, 79.2-86.8% for nomenclature II. Interobserver Kappa statistic was 0.53-0.56 for nomenclature I, 0.54-0.57 for nomenclature II. Intraobserver Kappa statistic was 0.60-0.85 for nomenclature I, 0.53-0.72 for nomenclature II. Conclusion : Experienced doctors showed only moderate interobserver agreement when interpreting disc status on lumbar magnetic resonance imaging. Intraobserver agreement was superior to interbserver. The standardization of nomenclatures for lumbar disc extension beyond interspace are needed.

  • PDF

A new measure of tracking in repeated measurement data (반복측정된 자료에 대한 새로운 지속성 지수)

  • 강형곤;김병수
    • The Korean Journal of Applied Statistics
    • /
    • v.10 no.1
    • /
    • pp.189-201
    • /
    • 1997
  • The primary purpose of this study is to develop a measure of tracking by using a modified kappa statistic. Understanding tracking phenomena in epidemiologic studies is quite important, because precautionary measure can be made in the early stage of the outcome event. Several authors proposed measures of tracking. Among them we compared ours against McMahan's using a simulation study. Finally we applied our procedure and McMahan's to real data. We may conclude that our statistic is adequate in explaining and detecting the tracking phenomenon.

  • PDF

Comparison of Standardized Patient and Faculty Agreement in Evaluating Nursing Students' Assessment and Communication Skills (시뮬레이션기반 실습 시 간호학생의 간호사정 및 의사소통 기술에 대한 표준화 환자와 교수자 간의 평가 일치도)

  • Kim, Young Ju
    • Journal of Korean Academy of Fundamentals of Nursing
    • /
    • v.24 no.3
    • /
    • pp.189-199
    • /
    • 2017
  • Purpose: This study was conducted to examine the level of agreement between a standardized patient (SP) and a faculty member in the evaluation of nursing students' assessment and communication skills. Methods: Participants were 51 third year nursing students in a simulation practice of 'nursing care for a patient admitted with chest pain'. Using a 30-item checklist and a 16-item communication tool, a SP and faculty member evaluated the students' assessment and communication skills during the simulation. Results: The average values for percent agreement and kappa statistic for nursing assessment between the two evaluators were 85.3% and .48 respectively. Twenty of thirty items evaluating assessment skill had above moderate agreement (${\geq}.41$) by kappa between the evaluators. Seven of sixteen items evaluating communication and interpersonal skills showed above fair agreement (${\geq}.40$) between the two evaluators, which was measured by intraclass correlation coefficient. Conclusion: The findings show that the evaluation of the SP was consistent with those of the faculty member to a moderate degree. Clear guidelines for evaluating criteria and optimal time and effort for SP training are necessary to increase the reliability of standardized patients as evaluators in simulation-based nursing education.

Development of Algorithms for Extracting Thermocline Parameters in the South Sea of Korea (한국 남부해역의 수온약층 추출 알고리즘 개발)

  • Yoon, Dong-Young;Choi, Hyun-Woo
    • Ocean and Polar Research
    • /
    • v.34 no.2
    • /
    • pp.265-273
    • /
    • 2012
  • A new algorithm was developed, not only to detect the existence of a thermocline, but also to extract the thermocline parameters (such as thermocline thickness, mixed layer thickness, maximum temperature gradient, and temperature difference of thermocline), using the vertical profile of water temperature. According to Kappa analysis, in order to find adequate threshold values of vertical water temperature gradients ${\Delta}T$ ($^{\circ}C/m$), agreement and reliability were 87% and 0.74 respectively, in the conditions of maximum ${\Delta}T{\geq}0.5$ and surface and bottom layers ${\Delta}T<{\mid}0.2{\mid}$. Also, three different kinds of methods, viz. 1. Gradient method, 2. Hyperbolic tangent method, and 3. Differential hyperbolic tangent method, were tested to extract the key parameters of a thermocline. Comparing the results of three different methods, the differential hyperbolic tangent method was the most appropriate to extract the start and end point of a thermocline curve.

Neonatal Intracranial Ischemia and Hemorrhage : Role of Cranial Sonography and CT Scanning

  • Khan, Imran Ahmad;Wahab, Shagufta;Khan, Rizwan Ahmad;Ullah, Kkram;Ali, Manazir
    • Journal of Korean Neurosurgical Society
    • /
    • v.47 no.2
    • /
    • pp.89-94
    • /
    • 2010
  • Objective : To evaluate the role of cranial sonography and computed tomography in the diagnosis of neonatal intracranial hemorrhage and hypoxic-ischemic injury in an Indian set-up. Methods : The study included 100 neonates who underwent cranial sonography and computed tomography (CT) in the first month of life for suspected intracranial ischemia and hemorrhage. Two observers rated the images for possible intracranial lesions and a kappa statistic for interobserver agreement was calculated. Results : There was no significant difference in the kappa values of CT and ultrasonography (USG) for the diagnosis of germinal matrix hemorrhage/intraventricular hemorrhage (GMH/IVH) and periventricular leucomalacia (PVL) and both showed good interobserver agreement. USG, however detected more cases of GMH/IVH (24 cases) and PVL (19) cases than CT (22 cases and 16 cases of IVH and PVL, respectively). CT had significantly better interobserver agreement for the diagnosis of hypoxic ischemic injury (HII) in term infants and also detected more cases (33) as compared to USG (18). CT also detected 6 cases of extraaxial hemorrhages as compared to 1 detected by USG. Conclusion : USG is better modality for imaging preterm neonates with suspected IVH or PVL. However, USG is unreliable in the imaging of term newborns with suspected HII where CT or magnetic resonance image scan is a better modality.

Detection of Hidden Proximal Caries using Q-ray view in Primary Molars (Q-ray view를 이용한 유구치의 숨은 인접면 우식증 탐지)

  • Jeong, Younwook;Lee, Hyoseol;Choi, Hyungjun;Lee, Jaeho;Choi, Byungjai;Kim, Seongoh
    • Journal of the korean academy of Pediatric Dentistry
    • /
    • v.42 no.3
    • /
    • pp.209-217
    • /
    • 2015
  • The purpose of this study was to evaluate the ability of Q-ray view (All-in-one Bio, Seoul, Korea) in detection of proximal caries in primary molars with sound marginal ridges. Thirty two children aged 3-9 years (average $5.6{\pm}1.3$ years old) were chosen, and two examiners evaluated 100 proximal surfaces of primary molars with sound marginal ridges. The teeth were examined with; (a) visual examination, (b) Q-ray view, (c) DIAGNOdent (KaVo, Biberach, Germany) and (d) digital periapical radiography. Kappa statistic was used to assess the agreement between each examination method and the degree of caries progression. The kappa values for enamel caries were 0.15 (visual examination), 0.10 (Q-ray view), 0.25 (DIAGNOdent) and 0.68 (digital periapical radiography). The kappa values for dentinal caries were 0.34 (visual examination), 0.56 (Q-ray view), 0.44 (DIAGNOdent) and 0.70 (digital periapical radiography). Although Q-ray view showed low diagnostic ability in detection of enamel caries, it was effective in detection of hidden proximal caries extended into dentin. Q-ray view would be a useful and simple device which could aid pediatric dentists in detection of hidden proximal caries in primary molars especially when examining uncooperative children or disabled persons.

k-Sample Rank Tests for Umbrella Location-Scale Alternatives (k-표본 우산형 위치-척도 대립가설에 대한 순위검정법의 연구)

  • Hee Moon Park
    • The Korean Journal of Applied Statistics
    • /
    • v.7 no.2
    • /
    • pp.159-171
    • /
    • 1994
  • Some rank score tests are proposed for testing the equality of all sampling distribution functions against umbrella location-scale alternatives in k-sample problem. Only the case of known peak $\ell$ is considered. Under the null hypothesis and a contiguous sequence of unbrella location-scale alternatives, the asymptotic properties of the proposed test statistics are investigated. Also, the asymptotic local powers are compared with each others. The results show that the tests based on the Chen-Wolfe rank analogue statistic are more powerful than others for unequally spaced umbrella location-scale alternatives and robust.

  • PDF