• Title/Summary/Keyword: 통계 오류

Search Result 388, Processing Time 0.031 seconds

A study on data processing of electric vehicle charging archives (전기자동차 충전기록 데이터 처리에 관한 연구)

  • Hwang, Yunweong;Jin, Hyojeong;Kim, Soyeon;Lee, Junghoon
    • Annual Conference of KIPS
    • /
    • 2022.11a
    • /
    • pp.337-338
    • /
    • 2022
  • 본 논문에서는 환경공단이 제공하는 전기자동차 충전기의 운영기록 분석하는 과정에서 주기적으로 공개 데이터를 수집하여 지역 데이터베이스에 저장하고 데이터의 오류를 정제하는 방안에 대해 연구한다. 전력시스템 부하에 직접적으로 영향을 주는 급속충전기 운영기록만을 추출하고 날짜필드에서의 오류 혹은 역전을 포함하는 트랜잭션을 제거한 후 일차적으로 충전시간길이를 히스토그램으로 분석한다. 대부분의 충전이 20분 이내에 완료되었지만 23%는 충전완료 후에도 충전기에서 플러그를 제거하지 않은 것으로 보인다.

Statistical Issues in the Articles Published in the Journal of Veterinary Clinics (한국임상수의학회지에 발표된 논문의 통계분석 검토)

  • Pak, Son-Il;Oh, Tae-Ho
    • Journal of Veterinary Clinics
    • /
    • v.27 no.2
    • /
    • pp.170-174
    • /
    • 2010
  • With the ease availability of statistical software and powerful computers the application of statistical methods in domestic veterinary journals is on the increase. In parallel with this benefit, statistical errors are not uncommon even in renowned scientific and medical journals. These errors may lead to misinterpretation of the data, thereby, subjected to faulty conclusions. A systematic review of articles published in 8 issues of the Journal of Veterinary Clinics during 2006-2007 was performed to assess the statistical methodology and reporting. Ninety-four (72.9%) articles of the 129 original articles screened included any inferential statistical analysis in the article, including comparison of 3 or more groups (53 or 56.4%), comparison of independent 2 groups (40 or 42.6%), and paired t-test (9 or 9.6%) in order. Of the 94 articles in which statistical analysis was done 62 (or 66.0%) had at least 1 statistical error. Errors included failure to apply or incorrectly applying independent Student's t-test for paired data or vice versa, inappropriate use of t-test for more than 3 groups and failure in chi-square test to consider continuity-correction for small expected frequencies. The common errors in ANOVA were failure to validate assumption of the test, inappropriate post-hoc multiple-comparison and incorrect assumption of independence of data in repeated measures design. Reporting errors included failure to state statistical methods and failure to state specific test if more than 1 test was done. It is suggested that an editorial effort would be necessary to achieve the improvement of appropriate statistical procedures through the publication of statistical guidelines to author(s).

Goodness of Fit Tests for the Exponential Distribution based on Multiply Progressive Censored Data (다중 점진적 중도절단에서 지수분포의 적합도 검정)

  • Yun, Hyejeong;Lee, Kyeongjun
    • Journal of the Korean Data Analysis Society
    • /
    • v.20 no.6
    • /
    • pp.2813-2827
    • /
    • 2018
  • Progressive censoring schemes have become quite popular in reliability study. Under progressive censored data, however, some units can be failed between two points of observation with exact times of failure of these units unobserved. For example, loss may arise in life-testing experiments when the failure times of some units were not observed due to mechanical or experimental difficulties. Therefore, multiply progressive censoring scheme was introduced. So, we derives a maximum likelihood estimator of the parameter of exponential distribution. And we introduced the goodness-of-fit test statistics using order statistic and Lorenz curve. We carried out Monte Carlo simulation to compare the proposed test statistics. In addition, real data set have been analysed. In Weibull and chi-squared distributions, the test statistics using Lorenz curve are more powerful than test statistics using order statistics.

A Spelling Correction System Based on Statistical Data of Spelling Errors (철자오류의 통계자료에 근거한 철자오류 교정시스템)

  • Lim, Han-Kyu;Kim, Ung-Mo
    • The Transactions of the Korea Information Processing Society
    • /
    • v.2 no.6
    • /
    • pp.839-846
    • /
    • 1995
  • In this paper, the spelling errors which are made by human being in the real word processors are collected and analyzed. Based on these data, we make a prototype which can perform spell aid function providing candidate words. The number of candidate characters are minimized by the frequency of Jaso and character, so the number of candidate words could be minimized. The average number of candidate words presented are 3.2 to 8, and 62.1 % to 84.1% of the correct words are presented in the candidate words.

  • PDF

A Study on the Characteristics of Errors Type for Wellness of Alzheimer's Dementia Patients in the Naming Task (알츠하이머성 치매환자의 웰니스를 위한 명명하기 과제에서의 오류유형 특성 연구)

  • Kang, Min-Gu
    • Journal of Korea Entertainment Industry Association
    • /
    • v.14 no.8
    • /
    • pp.213-219
    • /
    • 2020
  • The purpose of this study was to investigate the characteristics of error types in naming task for 8 questionable demeatia groups, 9 definite dementia groups, and 10 normal groups. The items of naming error analysis were classified into visual perception errors, semantic association errors, semantic non-correlation errors, phoneme errors, Don't Know, and No Response. For the analysis, descriptive statistics analysis, analysis of variance, and multivariate analysis of variance were conducted using SPSS 21.0. As a result, there was a significant difference in the error rate between groups according to the error type. The errors that showed significant differences between the normal group and the other two groups were visual perception errors and semantic non-related errors. The error of non-response was different from the dementia confirmation group, but there was no significant difference from the dementia suspicion group. These results showed that Alzheimer's patients had a defect in confrontation naming ability. Also, it was found that it is appropriate to provid other clues when the defects caused by the degeneration of a specific step during the information processing process become severe.

History of Probability and Statistics (확률과 통계의 역사)

  • Lee Kyung Hwa
    • Journal of Elementary Mathematics Education in Korea
    • /
    • v.1 no.1
    • /
    • pp.53-65
    • /
    • 1997
  • There are many mistakes when we estimate probability of an event, for example, we often omit some likelihoods (of an event), sometimes give too large or too small possibility for a particular case, cannot relate current cases with which were concerned before, apply at another cases as soon as discuss about it insufficiently, etc. If we go into a history of probability and statistics, we shall ascertain that many scientists and mathmaticians made essentially same mistakes with us. In the paper, we will consider the theorization of probability and statistics as a process of modification of mistakes which were made during one's estimating possibility of an event. On that point of view, we shall look at historical background of probability and statistics.

  • PDF

A Trial of Data Editing Using Fellegi-Holt Techniques and Its Analysis (Fellegi-Holt 기법을 이용한 에디팅의 시도 및 분석)

  • Lee, Eui-Kyoo;Shim, Kyu-Ho
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.4
    • /
    • pp.697-707
    • /
    • 2009
  • In actual statistical surveys, the inconsistencies within the record are often occurred due to incorrect response. The users may be confused and statistical agencies may have a problem of reliability on statistical data in this case. It is needed to detect and correct the unconvinced record without any special reasons. However, it is not simple to determine which item should be corrected in every failed record. In this paper we briefly introduce Fellegi-Holt method, apply to a business survey, and then discuss the problems for this trial editing.

Improving The Performance of Triple Generation Based on Distant Supervision By Using Semantic Similarity (의미 유사도를 활용한 Distant Supervision 기반의 트리플 생성 성능 향상)

  • Yoon, Hee-Geun;Choi, Su Jeong;Park, Seong-Bae;Park, Se-Young
    • Annual Conference on Human and Language Technology
    • /
    • 2015.10a
    • /
    • pp.23-28
    • /
    • 2015
  • 본 논문에서는 한국어 트리플 생성 시스템의 정확도를 향상시키기 위한 distant supervision 기반의 신뢰도 측정 방법을 제안한다. 기존의 많은 패턴 기반의 트리플 생성 시스템에는 distant supervision의 기본 가정으로 인해 다수의 오류 패턴이 발생할 여지가 크다. 기존의 연구에서는 오류 패턴을 제거하기 위하여 발생 빈도, 공기 횟수 등의 통계에 기반하여 간접적으로 신뢰도를 측정하였다. 본 논문에서는 한국어 패턴과 영어 프로퍼티 사이의 의미 유사도를 측정함으로써 통계에 기반한 방법보다 더 정확한 신뢰도 측정 방법을 제안한다. 비지도 학습 방법인 워드임베딩을 활용하여 어휘의 의미를 학습하고, 이들 사이의 유사도를 측정한다. 한국어 패턴과 영어 프로퍼티의 어휘 불일치 문제를 해결하기 위하여 정준상관분석을 활용하였다. 실험 결과에 따르면 본 논문에서 제안한 패턴 신뢰도 측정 방법은 통계 기반의 방법에 비해 정확률이 9%나 더 높은 트리플 집합을 생성함을 보여주어, 의미 유사도를 반영한 신뢰도 측정이 기존의 통계 기반 신뢰도 측정보다 고품질 트리플 생성에 더 적합함을 확인하였다.

  • PDF

Partial AUC and optimal thresholds (부분 AUC와 최적분류점들)

  • Hong, Chong Sun;Cho, Hyun Su
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.2
    • /
    • pp.187-198
    • /
    • 2019
  • Extensive literature exists on how to estimate optimal thresholds based on various accuracy measures using receiver operating characteristic (ROC) and cumulative accuracy profile (CAP) curves. This paper now proposes an alternative measure to represented the specific partial area under the ROC and CAP curves. The relationship between ROC and CAP functions is examined using differential equations of the new defined partial area under curves. In addition, the relationship with the optimal thresholds under conditions of various accuracy measures for the ROC and CAP functions is also derived. We assume there are two kinds of distribution functions composing the mixed distribution as various normal distributions before finding the optimal thresholds. Corresponding type 1 and 2 errors are also explored and discussed under various conditions for accuracy measures.