• Title/Summary/Keyword: 카파 계수

Search Result 33, Processing Time 0.023 seconds

Sentiment Categorization of Korean Customer Reviews using CRFs (CRFs를 이용한 한국어 상품평의 감정 분류)

  • Shin, Junsoo;Lee, Juhoo;Kim, Harksoo
    • Annual Conference on Human and Language Technology
    • /
    • 2008.10a
    • /
    • pp.58-62
    • /
    • 2008
  • 인터넷 상에서 상품을 구입할 때 고려하는 부분 중의 하나가 상품평이다. 하지만 이러한 상품평들을 개인이 일일이 확인 하는데에는 상당한 시간이 소요된다. 이러한 문제점을 줄이기 위해서 본 논문에서는 인터넷 상의 상품평에 대한 의견을 긍정, 부정, 일반으로 나누는 시스템을 제안한다. 제안 시스템은 CRFs 기계학습모델을 기반으로 하며, 연결어미, 형태소 유니그램, 슬라이딩 윈도우 기법의 형태소 바이그램을 자질로 사용한다. 실험을 위해서 가격비교 사이트의 모니터 카테고리에서 561개의 상품평을 수집하였다. 이 중 465개의 상품평을 학습 문서로 사용하였고 96개의 상품평을 실험 문서로 사용하였다. 제안 시스템은 실험결과 79% 정도의 정확도를 보였다. 추가 실험으로 제안 시스템이 사람들과 얼마나 비슷한 성능을 보이는지 알아보기 위해서 카파 테스트를 실시하였다. 카파 테스트를 실시한 결과, 사람간의 카파 계수는 0.6415였으며, 제안 시스템과 사람 간의 카파 계수는 평균 0.5976이였다. 결론적으로 제안 시스템이 사람보다는 떨어지지만 유사한 정도의 성능을 보임을 알 수 있었다.

  • PDF

Named Entity Recognition for Patent Documents Based on Conditional Random Fields (조건부 랜덤 필드를 이용한 특허 문서의 개체명 인식)

  • Lee, Tae Seok;Shin, Su Mi;Kang, Seung Shik
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.9
    • /
    • pp.419-424
    • /
    • 2016
  • Named entity recognition is required to improve the retrieval accuracy of patent documents or similar patents in the claims and patent descriptions. In this paper, we proposed an automatic named entity recognition for patents by using a conditional random field that is one of the best methods in machine learning research. Named entity recognition system has been constructed from the training set of tagged corpus with 660,000 words and 70,000 words are used as a test set for evaluation. The experiment shows that the accuracy is 93.6% and the Kappa coefficient is 0.67 between manual tagging and automatic tagging system. This figure is better than the Kappa coefficient 0.6 for manually tagged results and it shows that automatic named entity tagging system can be used as a practical tagging for patent documents in replacement of a manual tagging.

A simulation study of rater agreement measures (모의 실험을 이용한 여러 합치도들의 비교)

  • Han, Kyung-Do;Park, Yong-Gyu
    • Journal of the Korean Data and Information Science Society
    • /
    • v.23 no.1
    • /
    • pp.25-37
    • /
    • 2012
  • Many statistics, such as Cohen's (1960) ${\kappa}$, Scott's (1955) ${\pi}$, and Park and Park's (2007) H have been proposed as measures of agreement to represent inter-rater reliability. This study compared bias, SE, MSE, and CV of the measures of agreement with nominal and ordinal categories in the balanced marginal distributions, and those with nominal categories in the two paradoxical situations. As a result, in all cases, AC1and Hhad smaller SE and CV.

Detection of Burned Forest Areas Using Landsat TM Images (Landsat TM 위성영상을 이용한 산불 발생지역의 탐지)

  • 김철민;이승호;노대균
    • Proceedings of the KSRS Conference
    • /
    • 2001.03a
    • /
    • pp.77-81
    • /
    • 2001
  • 2000년 4월, 강원도 삼척일대에 크게 발생한 산불지역에 대해서 Landsat TM 인공위성 영상자료를 이용하여 산불의 피해지역을 조사분석하였다. 산불발생 전과 후의 2시기 위성영상을 이용하여 변화탐지 기법의 하나인 화상간차이법을 적용하였다. 분석결과 산불 발생지역의 탐지에는 NDVI를 유도하고 그 차이를 이용하는 것이 가장 탁월한 것으로 나타났다. 산불 피해지역을 구분하는 임계값을 표준편차$\times$0.9로 하였을 때, 현지조사 결과에 대한 전체정확도는 93.8%, 카파계수는 0.82로 매우 높았다.

  • PDF

Inter-Rater Reliability of Carotid Intima-Media Thickness Measurements in a Multicenter Cohort Study (다기관 코호트 연구에서 경동맥 내막-중막 두께 측정의 측정자간 신뢰도 평가)

  • Lee, Jung Hyun;Choi, Dong Phil;Shim, Jee-Seon;Kim, Dae Jung;Park, Sung-Ha;Kim, Hyeon Chang
    • Journal of health informatics and statistics
    • /
    • v.41 no.1
    • /
    • pp.49-56
    • /
    • 2016
  • Objectives: Carotid intima-media thickness (CIMT) and the presence of carotid artery plaque are widely used as preclinical markers of atherosclerosis. Due to operator dependency in measuring CIMT, it is important to evaluate the reliability of measuring CIMT and plaque between centers in a multicenter study. The purpose of this study is to evaluate the inter-rater reliability of CIMT and plaque presence among three clinical centers of the Cardiovascular and Metabolic Disease Etiology Research Center (CMERC). Methods: Twenty people without known cardiovascular disease (age 37-64) were enrolled during 2014-2015, and their left and right carotid arteries were examined repeatedly with ultrasonography for CIMT measurements at three clinical centers according to a predetermined protocol. Maximum and mean values of CIMT at distal common carotid artery were recorded. Plaque presence at a carotid artery was checked by an operator. The reliability of CIMT and carotid plaque presence was assessed using an intraclass correlation coefficient (ICC) and kappa statistics, respectively. Results: Calculated ICC was 0.647 (95% CI: 0.487-0.779) for maximum CIMT, and 0.758 (95% CI: 0.632- 0.854) for mean CIMT. In Bland Altman plot, most observed values were distributed within mean difference ${\pm}1.96$ SD ranges. Kappa statistics of plaque presence between two centers were 0.304 (center 1 and 2), 0.507 (center 1 and 3), and 0.606 (center 2 and 3), respectively, while Fleiss kappa for overall agreement was 0.445. Conclusions: The inter-rater reliability of CIMT measurements among three clinical centers turned out to be high, and the agreement of measuring carotid plaque presence was fair.

Early Estimation of Rice Cultivation in Gimje-si Using Sentinel-1 and UAV Imagery (Sentinel-1 및 UAV 영상을 활용한 김제시 벼 재배 조기 추정)

  • Lee, Kyung-do;Kim, Sook-gyeong;Ahn, Ho-yong;So, Kyu-ho;Na, Sang-il
    • Korean Journal of Remote Sensing
    • /
    • v.37 no.3
    • /
    • pp.503-514
    • /
    • 2021
  • Rice production with adequate level of area is important for decision making of rice supply and demand policy. It is essential to grasp rice cultivation areas in advance for estimating rice production of the year. This study was carried out to classify paddy rice cultivation in Gimje-si using sentinel-1 SAR (synthetic aperture radar) and UAV imagery in early July. Time-series Sentinel-1A and 1B images acquired from early May to early July were processed to convert into sigma naught (dB) images using SNAP (SeNtinel application platform, Version 8.0) toolbox provided by European Space Agency. Farm map and parcel map, which are spatial data of vector polygon, were used to stratify paddy field population for classifying rice paddy cultivation. To distinguish paddy rice from other crops grown in the paddy fields, we used the decision tree method using threshold levels and random forest model. Random forest model, trained by mainly rice cultivation area and rice and soybean cultivation area in UAV image area, showed the best performance as overall accuracy 89.9%, Kappa coefficient 0.774. Through this, we were able to confirm the possibility of early estimation of rice cultivation area in Gimje-si using UAV image.

Comparison and Evaluation of Classification Accuracy for Pinus koraiensis and Larix kaempferi based on LiDAR Platforms and Deep Learning Models (라이다 플랫폼과 딥러닝 모델에 따른 잣나무와 낙엽송의 분류정확도 비교 및 평가)

  • Yong-Kyu Lee;Sang-Jin Lee;Jung-Soo Lee
    • Journal of Korean Society of Forest Science
    • /
    • v.112 no.2
    • /
    • pp.195-208
    • /
    • 2023
  • This study aimed to use three-dimensional point cloud data (PCD) obtained from Terrestrial Laser Scanning (TLS) and Mobile Laser Scanning (MLS) to evaluate a deep learning-based species classification model for two tree species: Pinus koraiensis and Larix kaempferi. Sixteen models were constructed based on the three conditions: LiDAR platform (TLS and MLS), down-sampling intensity (1024, 2048, 4096, 8192), and deep learning model (PointNet, PointNet++). According to the classification accuracy evaluation, the highest kappa coefficients were 93.7% for TLS and 96.9% for MLS when applied to PCD data from the PointNet++ model, with down-sampling intensities of 8192 and 2048, respectively. Furthermore, PointNet++ was consistently more accurate than PointNet in all scenarios sharing the same platform and down-sampling intensity. Misclassification occurred among individuals of different species with structurally similar characteristics, among individual trees that exhibited eccentric growth due to their location on slopes or around trails, and among some individual trees in which the crown was vertically divided during tree segmentation.

The Automated Scoring of Kinematics Graph Answers through the Design and Application of a Convolutional Neural Network-Based Scoring Model (합성곱 신경망 기반 채점 모델 설계 및 적용을 통한 운동학 그래프 답안 자동 채점)

  • Jae-Sang Han;Hyun-Joo Kim
    • Journal of The Korean Association For Science Education
    • /
    • v.43 no.3
    • /
    • pp.237-251
    • /
    • 2023
  • This study explores the possibility of automated scoring for scientific graph answers by designing an automated scoring model using convolutional neural networks and applying it to students' kinematics graph answers. The researchers prepared 2,200 answers, which were divided into 2,000 training data and 200 validation data. Additionally, 202 student answers were divided into 100 training data and 102 test data. First, in the process of designing an automated scoring model and validating its performance, the automated scoring model was optimized for graph image classification using the answer dataset prepared by the researchers. Next, the automated scoring model was trained using various types of training datasets, and it was used to score the student test dataset. The performance of the automated scoring model has been improved as the amount of training data increased in amount and diversity. Finally, compared to human scoring, the accuracy was 97.06%, the kappa coefficient was 0.957, and the weighted kappa coefficient was 0.968. On the other hand, in the case of answer types that were not included in the training data, the s coring was almos t identical among human s corers however, the automated scoring model performed inaccurately.

수치변화탐지의 새로운 접근 - 기하거리분석법 -

  • Jeong, Seong-Hak
    • 한국지형공간정보학회:학술대회논문집
    • /
    • 1993.10a
    • /
    • pp.141-145
    • /
    • 1993
  • A new digital change detection algorithm, Euclidean Distance Analysis, was developed in an attempt to utilize the multi-band information in a selected band-comination, as an alternative to the conventional single-band analysis methods. To evaluate the relative performance of this new method, image differencing was applied. The better performance in change detection between the two algorithms investigated was provided by the Euclidean distance analysis. The new technique of Euclidean distance analysis holds promise for change detection, since it summarizes the multiple-band information on the cover-type changes and reduces the data dimensionality. It is suggested to further evaluate this new method, quantitatively, in the different environments. The use of different accuracy indices was also examined in the determining the optimal threshold level for each change image. As the standard measure for classification accuracy, the Kappa coefficient of agreement was used for evaluation.

  • PDF

Concept-based Automatic Scoring System for Korean Free-text or Constructed Answers (개념 기반 한국어 서답형 답안의 자동채점 시스템)

  • Park, Il-Nam;Noh, Eun-Hee;Sim, Jae-Ho;Kim, Myung-Hwa;Kang, Seung-Shik
    • Annual Conference on Human and Language Technology
    • /
    • 2012.10a
    • /
    • pp.69-72
    • /
    • 2012
  • 본 논문은 한국어 서답형(단어, 구 수준) 문항 유형을 분석하고 실제 채점자가 채점 기준표를 보고 채점하는 방법을 컴퓨터가 인식할 수 있도록 정답 템플릿을 설계 및 개념 정의를 하여 한국어 서답형에 특화된 자동채점 시스템 방법을 제시한다. 본 시스템을 사용하여 1000개의 학생 답안지에 대한 유형 가지수 500개 이하의 2011년도 학업성취도 평가 과학 6개 문항에 대하여 채점 기준표 내용을 정답 템플릿으로 작성한 뒤 250개 학생 답안을 학습데이터로, 정답 템플릿을 업데이트로 사용, 750개 학생 답안에 대하여 자동채점한 결과, 평균 카파계수 0.84라는 수치로서 실제 사람 채점 결과와 거의 완벽히 일치라는 결과를 얻었다.

  • PDF