• Title/Summary/Keyword: Cohen's Kappa

Search Result 89, Processing Time 0.026 seconds

Assessing Classification Accuracy using Cohen's kappa in Data Mining (데이터 마이닝에서 Cohen의 kappa를 이용한 분류정확도 측정)

  • Um, Yonghwan
    • Journal of the Korea Society of Computer and Information
    • /
    • v.18 no.1
    • /
    • pp.177-183
    • /
    • 2013
  • In this paper, Cohen's kappa and weighted kappa are applied to measuring classification accuracy when performing classification in data minig. Cohen's kappa compensates for classifications that may be due to chance and is used for the data with nominal or ordinal scales. Especially, for the ordinal data, weighted kappa which measures the classification accuracy by quantifying the classification errors as weights is used. We used two weights (linear weight, quadratic weight) for calculations of weighted kappa. Also for the calculation and comparison of kappa and weighted kappa we used a real data set, fat-liver data.

A New Measure of Agreement to Resolve the Two Paradoxes of Cohen's Kappa (COHEN의 합치도의 두 가지 역설을 해결하기 위한 새로운 합치도의 제안)

  • Park, Mi-Hee;Park, Yong-Gyu
    • The Korean Journal of Applied Statistics
    • /
    • v.20 no.1
    • /
    • pp.117-132
    • /
    • 2007
  • In a $2\times2$ table showing binary agreement between two raters, it is known that Cohen's $\kappa$, a chance-corrected measure of agreement, has two paradoxes. $\kappa$ is substantially sensitive to raters' classification probabilities(marginal probabilities) and does not satisfy conditions as a chance-corrected measure of agreement. However, $\kappa$ and other established measures have a reasonable and similar value when each marginal distribution is close to 0.5. The objectives of this paper are to present a new measure of agreement, H, which resolves paradoxes of $\kappa$ by adjusting unbalanced marginal distributions and to compare the proposed measure with established measures through some examples.

A simulation study of rater agreement measures (모의 실험을 이용한 여러 합치도들의 비교)

  • Han, Kyung-Do;Park, Yong-Gyu
    • Journal of the Korean Data and Information Science Society
    • /
    • v.23 no.1
    • /
    • pp.25-37
    • /
    • 2012
  • Many statistics, such as Cohen's (1960) ${\kappa}$, Scott's (1955) ${\pi}$, and Park and Park's (2007) H have been proposed as measures of agreement to represent inter-rater reliability. This study compared bias, SE, MSE, and CV of the measures of agreement with nominal and ordinal categories in the balanced marginal distributions, and those with nominal categories in the two paradoxical situations. As a result, in all cases, AC1and Hhad smaller SE and CV.

Comparison of the performance of classification algorithms using cytotoxicity data (세포독성 자료를 이용한 분류 알고리즘 성능 비교)

  • Yoon, Yeochang;Jeung, Eui Bae;Jo, Na Rae;Ju, Su In;Lee, Sung Duck
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.3
    • /
    • pp.417-426
    • /
    • 2018
  • An alternative developmental toxicity test using mouse embryonic stem cell derived embryoid bodies has been developed. This alternative method is not to administer chemicals to animals, but to treat chemicals with cells. This study suggests the use of Discriminant Analysis, Support Vector Machine, Artificial Neural Network and k-Nearest Neighbor. Algorithm performance was compared with accuracy and a weighted Cohen's kappa coefficient. In application, various classification techniques were applied to cytotoxicity data to classify drug toxicity and compare the results.

Interrater Reliability in the Content Analysis of Preparatory Information for Mechanically Ventilated Patients (인공호흡기 사용 환자들에게 제공된 예비적 정보에 대한 내용분석의 측정자간 신뢰도)

  • Kim Hwa-Soon
    • Journal of Korean Academy of Fundamentals of Nursing
    • /
    • v.5 no.2
    • /
    • pp.269-279
    • /
    • 1998
  • In nursing research that the data is collected through clinical observation, analysis of clinical recording or coding of interpersonal interaction in clinical areas, testing and reporting interrater reliability is very important to assure reliable results. Procedures for interrater reliability in these studies should follow two steps. The first step is to determine unitizing reliability, which is defined as consistency in the identification of same data elements in the record by two or more raters reviewing the same record. Unitizing reliability have been rarely reported in previous studies. Unitizing reliability should be tested before progressing to the next step as precondition. Next step is to determine interpretive reliability. Cohen's kappa is a preferable method of calculating the extent of agreement between observer or judges because it provides beyond-chance agreement. Despite its usefulness, kappa can sometimes present paradoxical conclusions and can be difficult to interpret. These difficulties result from the feature of kappa which is affected in complex ways by the presence of bias between observers and by true prevalence of certain categories. Therefore, percentage agreement should be reported with kappa for adequate interpretation of kappa. The presence of bias should be assessed using the bias index and the effect of prevalence should be assessed using the prevalence index. Researchers have been reported only global reliability reflecting the extent to which coders can consistently use the whole coding system across all categories. Category-by-category reliability also need to be reported to inform the possibility that some categories are harder to use than others.

  • PDF

Comparison between denture wearer's evaluation and clinician's rating for complete denture (총의치 사용에 대한 환자와 술자간 평가 비교)

  • Byun, Jin-Soo;Huh, Yoon-Hyuk;Cho, Lee-La;Park, Chan-Jin
    • The Journal of Korean Academy of Prosthodontics
    • /
    • v.54 no.4
    • /
    • pp.364-369
    • /
    • 2016
  • Purpose: The aim of this study was to compare denture wearer's evaluation and clinician's technical rating for complete denture used on edentulous patients. Materials and methods: Total 43 edentulous patients who had complete denture fabricated more than one year ago were recalled. The questionnaire based on the various literatures was modified and applied to patients for subjective assessments. Functional aspects related to retention, stability, occlusion and denture condition were included in operator's evaluation. In addition, correlations were evaluated between patient's subjective and operator's objective assessments. Friedman test and Cohen's Kappa value were used for statistical analysis. Results: It was found that denture wearers' evaluations were slightly or fairly agree to clinician's rating for complete denture. More differences were found in maxillary denture than mandibular denture and moderate difference was found in esthetic, occlusion aspects. Conclusion: There were slightly or fairly agreement between subjective and objective evaluations.

A Study on Classifying Sea Ice of the Summer Arctic Ocean Using Sentinel-1 A/B SAR Data and Deep Learning Models (Sentinel-1 A/B 위성 SAR 자료와 딥러닝 모델을 이용한 여름철 북극해 해빙 분류 연구)

  • Jeon, Hyungyun;Kim, Junwoo;Vadivel, Suresh Krishnan Palanisamy;Kim, Duk-jin
    • Korean Journal of Remote Sensing
    • /
    • v.35 no.6_1
    • /
    • pp.999-1009
    • /
    • 2019
  • The importance of high-resolution sea ice maps of the Arctic Ocean is increasing due to the possibility of pioneering North Pole Routes and the necessity of precise climate prediction models. In this study,sea ice classification algorithms for two deep learning models were examined using Sentinel-1 A/B SAR data to generate high-resolution sea ice classification maps. Based on current ice charts, three classes (Open Water, First Year Ice, Multi Year Ice) of training data sets were generated by Arctic sea ice and remote sensing experts. Ten sea ice classification algorithms were generated by combing two deep learning models (i.e. Simple CNN and Resnet50) and five cases of input bands including incident angles and thermal noise corrected HV bands. For the ten algorithms, analyses were performed by comparing classification results with ground truth points. A confusion matrix and Cohen's kappa coefficient were produced for the case that showed best result. Furthermore, the classification result with the Maximum Likelihood Classifier that has been traditionally employed to classify sea ice. In conclusion, the Convolutional Neural Network case, which has two convolution layers and two max pooling layers, with HV and incident angle input bands shows classification accuracy of 96.66%, and Cohen's kappa coefficient of 0.9499. All deep learning cases shows better classification accuracy than the classification result of the Maximum Likelihood Classifier.

Reliability of Q-Ray View for Assessing Retention Status of Pit and Fissure Sealant (Q-Ray View를 이용한 치면열구전색재의 유지상태 평가)

  • Nam, Sang-Mi;Ku, Hye-Min;Lee, Eun-Song;Kim, Baek-Il
    • The Journal of the Korean dental association
    • /
    • v.58 no.3
    • /
    • pp.140-151
    • /
    • 2020
  • Purpose: To evaluate reliability of Q-ray view (Aiobio Inc,. Seoul, Korea) for assessing retention status of pit and fissure sealants. Methods: Pit and fissure sealants of 58 permanent molars from 15 third-grade students were examined. Posterior teeth with ≥1 pit and fissure sealants applied to the occlusal surface for >6 months were examined. The teeth were examined using traditional visual-tactile assessments and combined Q-ray view. Pit and fissure sealants were evaluated by assessing marginal plaque, marginal discoloration, marginal integrity, retention, and presence of caries. Fleiss kappa and Cohen's kappa values were calculated to compare inter- and intrarater agreements between visual-tactile and combined Q-ray view assessments. Results: Regarding interrater agreement in visual-tactile assessments, K values of Cohen's kappa for marginal plaque, marginal discoloration, and presence of caries were 0.22-0.57, 0.36-0.57, and 0.43-0.61, respectively, and agreements ranged from slight to moderate. When combined with Q-ray view, the values were 0.81-0.89, 0.69-0.88, and 0.80-0.90, respectively, and agreements ranged from substantial to nearly perfect level, indicating statistical significance. Marginal plaque (0.81-0.83), marginal discoloration (0.57-0.89), and presence of caries (0.69-0.91) showed higher agreements in combined Q-ray view than in visual-tactile assessments, and kappa values of marginal plaques were significantly higher in combined Q-ray view than in visual-tactile assessments. Conclusion: Evaluating retention status of pit and fissure sealants using Q-ray view showed higher reliability than using visual/tactile assessments for marginal plaque, marginal discoloration, and presence of caries. Therefore, Q-ray view may be used to assess the retention status of pit and fissure sealants.

  • PDF

Reliability of Modified Ashworth Scale Using a Haptic Robot Finger Simulating Finger Spasticity (손가락 경직을 모사하는 로봇 시뮬레이터를 이용한 경직도 검진의 신뢰도 평가)

  • Ha, Dokyeong;Park, Hyung-Soon
    • Transactions of the Korean Society of Mechanical Engineers B
    • /
    • v.41 no.2
    • /
    • pp.125-133
    • /
    • 2017
  • This paper presents the inter-rater reliability of finger spasticity assessment tested realized by using finger simulator that mimics finger spasticity of patients after a stroke. For controlling the simulator torque, finger spasticity was modeled, and the model parameters were obtained by measuring quantitative data while grading based on Modified Ashworth Scale (MAS). A robotic finger simulator was designed for mimicking finger spasticity. Evaluation of this simulator with the help of seven rehabilitation doctors showed that the simulator had a Cohen's kappa value of 0.619 for Metacarpophalangeal Joint and 0.514 for Proximal Interphalangeal Joint. Fleiss' kappa between raters is 0.513 for Metacarpophalangeal Joint and 0.486 for Proximal Interphalangeal Joint. Therefore, the spasticity assessment made by MAS grade system is not reliable owing to the subjectivity of the assessment. The proposed robotic simulator can be used as a training tool for improving the reliability of the spasticity assessment.

Agreement between Smartphone Addiction and Perceived Smartphone Addiction among Adolescents (청소년의 스마트폰 중독수준과 중독인식간의 일치도)

  • Kim, Sohyun;Jeong, Ihn Sook
    • The Journal of Korean Society for School & Community Health Education
    • /
    • v.15 no.2
    • /
    • pp.91-101
    • /
    • 2014
  • Objectives: This study aimed to identify agreement between smartphone addiction (SA) and perceived SA among adolescents. Methods: This survey was done with 394 subjects conveniently sampled from elementary school (ES), middle school (MS), academic (AHS) and vocational (VHS) high school. The data were collected from June 20 to July 20, 2013 with self-administered questionnaire and analyzed by descriptive statistics, chi-square test and Cohen's kappa (${\kappa}$). Results: High risk of SA showed 2.8% in total, 1.0% in ES, 2.1% in MS, 4.1% in AHS, and 4.0% in VHS, which was statistically different by type of school (p<0.001). Perceived SA showed 22.6% in total, 9.0% in ES, 21.9% in MS, 33.7% in AHS, and 26.0% in VHS, which was statistically different by type of school (p=0.003). The agreement between SA and perceived SA was 82.0% (${\kappa}$=0.54) in total, 91.8% (${\kappa}$=0.53) in ES, 75.0% (${\kappa}$=0.33) in MS, 77.5% (${\kappa}$=0.53) in AHS, and 84.8% (${\kappa}$=0.65) in VHS. Conclusion: The agreement between SA and perceived SA showed moderate in general, but fair in MS. It is suggested to develop step-by-step activities to reduce the gap between SA and perceived SA among adolescen, in particular, middle school students.

  • PDF