• Title/Summary/Keyword: 오분류

Search Result 798, Processing Time 0.03 seconds

Classification Analysis for Unbalanced Data (불균형 자료에 대한 분류분석)

  • Kim, Dongah;Kang, Suyeon;Song, Jongwoo
    • The Korean Journal of Applied Statistics
    • /
    • v.28 no.3
    • /
    • pp.495-509
    • /
    • 2015
  • We study a classification problem of significant differences in the proportion of two groups known as the unbalanced classification problem. It is usually more difficult to classify classes accurately in unbalanced data than balanced data. Most observations are likely to be classified to the bigger group if we apply classification methods to the unbalanced data because it can minimize the misclassification loss. However, this smaller group is misclassified as the larger group problem that can cause a bigger loss in most real applications. We compare several classification methods for the unbalanced data using sampling techniques (up and down sampling). We also check the total loss of different classification methods when the asymmetric loss is applied to simulated and real data. We use the misclassification rate, G-mean, ROC and AUC (area under the curve) for the performance comparison.

Alternative Optimal Threshold Criteria: MFR (대안적인 분류기준: 오분류율곱)

  • Hong, Chong Sun;Kim, Hyomin Alex;Kim, Dong Kyu
    • The Korean Journal of Applied Statistics
    • /
    • v.27 no.5
    • /
    • pp.773-786
    • /
    • 2014
  • We propose the multiplication of false rates (MFR) which is a classification accuracy criteria and an area type of rectangle from ROC curve. Optimal threshold obtained using MFR is compared with other criteria in terms of classification performance. Their optimal thresholds for various distribution functions are also found; consequently, some properties and advantages of MFR are discussed by comparing FNR and FPR corresponding to optimal thresholds. Based on general cost function, cost ratios of optimal thresholds are computed using various classification criteria. The cost ratios for cost curves are observed so that the advantages of MFR are explored. Furthermore, the de nition of MFR is extended to multi-dimensional ROC analysis and the relations of classification criteria are also discussed.

Abnormality Detection of ECG Signal by Rule-based Rhythm Classification (규칙기반 리듬 분류에 의한 심전도 신호의 비정상 검출)

  • Ryu, Chun-Ha;Kim, Sung-Oan;Kim, Se-Yun;Kim, Tae-Hun;Choi, Byung-Jae;Park, Kil-Houm
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.22 no.4
    • /
    • pp.405-413
    • /
    • 2012
  • Low misclassification performance is significant with high classification accuracy for a reliable diagnosis of ECG signals, and diagnosing abnormal state as normal state can especially raises a deadly problem to a person in ECG test. In this paper, we propose detection and classification method of abnormal rhythm by rule-based rhythm classification reflecting clinical criteria for disease. Rule-based classification classifies rhythm types using rule-base for feature of rhythm section, and rule-base deduces decision results corresponding to professional materials of clinical and internal fields. Experimental results for the MIT-BIH arrhythmia database show that the applicability of proposed method is confirmed to classify rhythm types for normal sinus, paced, and various abnormal rhythms, especially without misclassification in detection aspect of abnormal rhythm.

Exploring Middle School Students' Types of Misconceptions on Astronomy Terminologies (중학교 천문학 용어에 대한 학생의 오개념 유형 탐색)

  • Choi, Youngjin;Shin, Donghee
    • Journal of Science Education
    • /
    • v.44 no.3
    • /
    • pp.289-299
    • /
    • 2020
  • In this study, the definition, the level of difficulty, and the certainty of the understanding of 113 astronomy terminologies from 2009 revised middle school geoscience textbooks were examined. And through further interviews, the types of students' misconceptions about astronomy terminologies and their representative terms - examples of misconceptions were analyzed. The definitions of the terms presented by the students were largely classified as correct, low-level, and incorrect understanding. And low-level understanding was subdivided into high-level definition descriptions, undifferentiated concepts, and incorrect answers were subdivided into interference by scientific misconception and lack of prior knowledge. Given that the misconceptions due to terminologies can be distinguished from the prior misconception, the misconceptions due to terminologies can be effectively prevented by changing the term itself. In addition, students were aware of the advantages and disadvantages of metaphorical terms, and the recognition of their level of understanding is expected to be a good starting point considering that recognizing their own misconceptions is the first step in correcting them. Terminologies in science education is always an important subject of discussions, striving to select the right term according to the times, and scientific terms may change. It is expected that the results of this study will be the basis for discussions on the modification of terms.

Modulation classification for BPSK and QPSK signals over rayleigh fading channel (Payleigh 페이딩 채널에서 BPSK와 QPSK 신호의 변조 분류)

  • 윤동원;한영열
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.21 no.4
    • /
    • pp.1019-1026
    • /
    • 1996
  • A modulation type classifier based on statistical moments has been successfully employed to classify PSK signals. Previously, developed Classifiers were analyzed in AWGN channel only. In this paper, a moments-based modulation type classifier to classify BPSK and QPSK signals over Rayleigh fading channel is proposed and analyzed. The moments of received signal are evaluated with the exact distribution of the received signal and a moments-based classifier is proposed. The performance evaluation of the proposed classifier in terms of the misclassification probability for BPSK and QPSK is investigated under Rayleigh fading environment.

  • PDF

Classification of Advertising Spam Reviews (제품 리뷰문에서의 광고성 문구 분류 연구)

  • Park, Insuk;Kang, Hanhoon;Yoo, Seong Joon
    • Annual Conference on Human and Language Technology
    • /
    • 2010.10a
    • /
    • pp.186-190
    • /
    • 2010
  • 본 논문은 쇼핑몰의 이용 후기 중 광고성 리뷰를 분류해 내는 방법을 제안한다. 여기서 광고성 리뷰는 주로 업체에서 작성하는 것으로 리뷰 안에 광고 내용이 포함되어 있다. 국외 연구 중에는 드물게 오피니언 스팸 문서의 분류 연구가 진행되고 있지만 한국어 상품평으로부터 광고성 리뷰를 분류하는 연구는 아직 이루어지지 않고 있다. 본 논문에서는 Naive Bayes Classifier를 활용하여 광고성 리뷰를 분류하였다. 이때 확률 계산을 위해 사용된 특징 단어는 POS-Tagging+Bigram, POS-Tagging+Unigram, Bigram을 사용하여 추출하였다. 실험 결과는 POS-Tagging+Bigram 방법을 이용하였을 때 광고성 리뷰의 F-Measure가 80.35%로 정확도 높았다.

  • PDF

A Method to resolve the Limit of Traffic Classification caused by Abnormal TCP Session (TCP 세션의 이상동작으로 인한 트래픽 분석 방법론의 한계와 해결 방안)

  • An, Hyeon-Min;Choe, Ji-Hyeok;Ham, Jae-Hyeon;Kim, Myeong-Seop
    • KNOM Review
    • /
    • v.15 no.1
    • /
    • pp.31-39
    • /
    • 2012
  • 오늘날 네트워크 환경은 다양한 응용의 등장으로 트래픽이 복잡 다양해지고 있다. 이러한 상황 속에서 정확한 네트워크의 상태 파악을 위한 트래픽의 응용 별 분류에 대한 중요성은 더욱더 증가하고 있다. 최근 트래픽 플로우의 통계 정보를 이용한 트래픽의 응용 별 분류 방법론에 대한 연구가 활발히 진행되고 있다. 하지만 대부분의 연구들은 TCP 세션의 이상 동작에 대한 고려가 없어 분류결과의 오분류 및 미분류가 발생할 수 있다. 따라서 본 논문에서는 TCP 세션의 이상동작의 문제점을 지적하고 이를 개선하는 방법론을 제안한다. 제안된 방법론을 통계적 응용 트래픽 분류방법에 적용함으로써 그 타당성을 증명한다.

An Opinion Document Clustering Technique for Product Characterization (제품 특징화를 위한 오피니언 문서의 클러스터링 기법)

  • Chang, Jae-Young
    • The Journal of Society for e-Business Studies
    • /
    • v.19 no.2
    • /
    • pp.95-108
    • /
    • 2014
  • Opinion Mining is one of the application domains of text mining which extracting opinions from documents, and much researches are currently underway. Most of related researches focused on the sentiment classification which classifies the documents into positive/negative opinions. However, there is a little interest in extracting the features characterizing the individual product. In this paper, we propose the technique classifying the opinion documents according to the product features, and selecting the those features characterizing each product. In the proposed method, we utilize the document clustering technique and develope a new algorithm for evaluating the similarity between documents. In addition, through experiments, we prove the usefulness of proposed method.

확률화응답에 대한 대수선형모형

  • 최경호
    • Communications for Statistical Applications and Methods
    • /
    • v.4 no.3
    • /
    • pp.725-734
    • /
    • 1997
  • 많은 사회과학 조사에서 분할표 형태로 얻어진 범주형 자료에는 오분류(misclassification)로 인한 오차가 내재되는 경우가 종종 있다. 질적속성 추정을 위한 확률화응답은 이러한 오분류 문제의 한 특수한 경우로 여겨지기도 한다. 그래서 확률화응답을 통한 범주형자료는 혼합된 분할표(mixed-up contingency table)로 여길 수 있는 바, 본 논문에서는 이에 대해 대수선형모형(log-linear model)을 설정하고 Chen과 Fienberg(1976)의 Iterative scaling procedure(ISP)에 의하여 얻어진 최우추정량의 극한을 이용하였다. 이 결과 Warner(1965) 형태의 대칭기법에 대해서는 Singh(1976)에 의하여 제안된 최우추정량과 같아지게 됨을 보임으로써 Warner에 의해서 제시된 추정량이 최우추정량으로 적절하지 않음을 확인해 보고, 무관질문기법에 대해서는 Greenberg, et al.(1969)에 의해서 제안된 추정량이 추정의 관점에서 최우추정량으로 적절하지 않음을 알아 보았다.

  • PDF

A Study on the Relationship between Class Similarity and the Performance of Hierarchical Classification Method in a Text Document Classification Problem (텍스트 문서 분류에서 범주간 유사도와 계층적 분류 방법의 성과 관계 연구)

  • Jang, Soojung;Min, Daiki
    • The Journal of Society for e-Business Studies
    • /
    • v.25 no.3
    • /
    • pp.77-93
    • /
    • 2020
  • The literature has reported that hierarchical classification methods generally outperform the flat classification methods for a multi-class document classification problem. Unlike the literature that has constructed a class hierarchy, this paper evaluates the performance of hierarchical and flat classification methods under a situation where the class hierarchy is predefined. We conducted numerical evaluations for two data sets; research papers on climate change adaptation technologies in water sector and 20NewsGroup open data set. The evaluation results show that the hierarchical classification method outperforms the flat classification methods under a certain condition, which differs from the literature. The performance of hierarchical classification method over flat classification method depends on class similarities at levels in the class structure. More importantly, the hierarchical classification method works better when the upper level similarity is less that the lower level similarity.