• Title/Summary/Keyword: 통계학과

Search Result 688, Processing Time 0.026 seconds

Odds curve and optimal threshold (오즈 곡선과 최적분류점)

  • Hong, Chong Sun;Oh, Tae Gyu;Oh, Se Hyeon
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.5
    • /
    • pp.807-822
    • /
    • 2021
  • Various accuracy measures that can be explained on the odds curve are discussed, and an alternative accuracy measure, the maximum square, is proposed based on the characteristics of the odds curve. Thresholds corresponding to these accuracy measures are obtained by considering various probability distribution functions and an illustrative example. Their characteristics are discussed while comparing many kinds of statistics measuring thresholds. Therefore, we can conclude that optimal thresholds could be explored from the odds curve, similar to the ROC curve, and that the maximum square measure can be used as a good accuracy measure that can improve the performance of the binary classification model.

Robust group independent component analysis (로버스트 그룹 독립성분분석)

  • Kim, Hyunsung;Li, XiongZhu;Lim, Yaeji
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.2
    • /
    • pp.127-139
    • /
    • 2021
  • Independent Component Analysis is a popular statistical method to separate independent signals from the mixed data, and Group Independent Component Analysis is an its multi-subject extension of Independent Component Analysis. It has been applied Functional Magnetic Resonance Imaging data and provides promising results. However, classical Group Independent Component Analysis works poorly when outliers exist on data which is frequently occurred in Magnetic Resonance Imaging scanning. In this study, we propose a robust version of the Group Independent Component Analysis based on ROBPCA. Through the numerical studies, we compare proposed method to the conventional method, and verify the robustness of the proposed method.

A review on robust principal component analysis (강건 주성분분석에 대한 요약)

  • Lee, Eunju;Park, Mingyu;Kim, Choongrak
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.2
    • /
    • pp.327-333
    • /
    • 2022
  • Principal component analysis (PCA) is the most widely used technique in dimension reduction, however, it is very sensitive to outliers. A robust version of PCA, called robust PCA, was suggested by two seminal papers by Candès et al. (2011) and Chandrasekaran et al. (2011). The robust PCA is an essential tool in the artificial intelligence such as background detection, face recognition, ranking, and collaborative filtering. Also, the robust PCA receives a lot of attention in statistics in addition to computer science. In this paper, we introduce recent algorithms for the robust PCA and give some illustrative examples.

Optimal threshold using the correlation coefficient for the confusion matrix (혼동행렬의 상관계수를 이용한 최적분류점)

  • Hong, Chong Sun;Oh, Se Hyeon;Choi, Ye Won
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.1
    • /
    • pp.77-91
    • /
    • 2022
  • The optimal threshold estimation is considered in order to discriminate the mixture distribution in the fields of Biostatistics and credit evaluation. There exists well-known various accuracy measures that examine the discriminant power. Recently, Matthews correlation coefficient and the F1 statistic were studied to estimate optimal thresholds. In this study, we explore whether these accuracy measures are appropriate for the optimal threshold to discriminate the mixture distribution. It is found that some accuracy measures that depend on the sample size are not appropriate when two sample sizes are much different. Moreover, an alternative method for finding the optimal threshold is proposed using the correlation coefficient that defines the ratio of the confusion matrix, and the usefulness and utility of this method are also discusses.

Simulation for Power Efficiency Optimization of Air Compressor Using Machine Learning Ensemble (머신러닝 앙상블을 활용한 공압기의 전력 효율 최적화 시뮬레이션 )

  • Juhyeon Kim;Moonsoo Jang;Jieun Choi;Yoseob Heo;Hyunsang Chung;Soyoung Park
    • Journal of the Korean Society of Industry Convergence
    • /
    • v.26 no.6_3
    • /
    • pp.1205-1213
    • /
    • 2023
  • This study delves into methods for enhancing the power efficiency of air compressor systems, with the primary objective of significantly impacting industrial energy consumption and environmental preservation. The paper scrutinizes Shinhan Airro Co., Ltd.'s power efficiency optimization technology and employs machine learning ensemble models to simulate power efficiency optimization. The results indicate that Shinhan Airro's optimization system led to a notable 23.5% increase in power efficiency. Nonetheless, the study's simulations, utilizing machine learning ensemble techniques, reveal the potential for a further 51.3% increase in power efficiency. By continually exploring and advancing these methodologies, this research introduces a practical approach for identifying optimization points through data-driven simulations using machine learning ensembles.

Korean speech recognition using deep learning (딥러닝 모형을 사용한 한국어 음성인식)

  • Lee, Suji;Han, Seokjin;Park, Sewon;Lee, Kyeongwon;Lee, Jaeyong
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.2
    • /
    • pp.213-227
    • /
    • 2019
  • In this paper, we propose an end-to-end deep learning model combining Bayesian neural network with Korean speech recognition. In the past, Korean speech recognition was a complicated task due to the excessive parameters of many intermediate steps and needs for Korean expertise knowledge. Fortunately, Korean speech recognition becomes manageable with the aid of recent breakthroughs in "End-to-end" model. The end-to-end model decodes mel-frequency cepstral coefficients directly as text without any intermediate processes. Especially, Connectionist Temporal Classification loss and Attention based model are a kind of the end-to-end. In addition, we combine Bayesian neural network to implement the end-to-end model and obtain Monte Carlo estimates. Finally, we carry out our experiments on the "WorimalSam" online dictionary dataset. We obtain 4.58% Word Error Rate showing improved results compared to Google and Naver API.

A comparison of synthetic data approaches using utility and disclosure risk measures (유용성과 노출 위험성 지표를 이용한 재현자료 기법 비교 연구)

  • Seongbin An;Trang Doan;Juhee Lee;Jiwoo Kim;Yong Jae Kim;Yunji Kim;Changwon Yoon;Sungkyu Jung;Dongha Kim;Sunghoon Kwon;Hang J Kim;Jeongyoun Ahn;Cheolwoo Park
    • The Korean Journal of Applied Statistics
    • /
    • v.36 no.2
    • /
    • pp.141-166
    • /
    • 2023
  • This paper investigates synthetic data generation methods and their evaluation measures. There have been increasing demands for releasing various types of data to the public for different purposes. At the same time, there are also unavoidable concerns about leaking critical or sensitive information. Many synthetic data generation methods have been proposed over the years in order to address these concerns and implemented in some countries, including Korea. The current study aims to introduce and compare three representative synthetic data generation approaches: Sequential regression, nonparametric Bayesian multiple imputations, and deep generative models. Several evaluation metrics that measure the utility and disclosure risk of synthetic data are also reviewed. We provide empirical comparisons of the three synthetic data generation approaches with respect to various evaluation measures. The findings of this work will help practitioners to have a better understanding of the advantages and disadvantages of those synthetic data methods.

환경분야를 위한 공간정보 분석 기술의 동향과 전망 - 지구통계학을 중심으로

  • Park, No-Uk
    • Proceedings of the Korean Association of Geographic Inforamtion Studies Conference
    • /
    • 2010.06a
    • /
    • pp.187-187
    • /
    • 2010
  • 공간자료를 다루는 일반적인 과정은 연구자의 정의에 따라 달라질 수 있지만, 일반적으로 자료 수집, 자료 구축, 분석 및 결과 도출의 일반적인 과학/공학적 분석 절차와 유사하다. 산업체의 관점에서 볼 때, 1990년대 초기 국가GIS 사업이 시작될때부터 현재까지는 공인된 자료 구축에 많은 주안점을 두어서 기존 아날로그 자료의 디지털화, 자료 가공, 데이터베이스 구축, 자료의 시각화 등의 일반적인 자료 구축 및 도시에 주안점을 두어왔다. 또한 다양한 공간해상도의 원격탐사 자료와 같이 다중 근원 자료의 이용이 빈번해짐에 따라 공간자료의 갱신 또한 중요한 부분을 차지하고 있다. 그러나, 공간자료를 다루는 일련의 과정이 궁극적으로는 특정 분야에서의 의사 결정보조자료의 제공 등을 지향한다고 간주할 때, "from data to information to knowledge"의 중간 혹은 최종 단계의 결과물을 산출하기 위한 적절한 분석 기술의 개발 및 적용 또한 중요한 부분을 차지한다. 공간분석을 별도의 학문분야로 간주하느냐 아니냐의 문제와는 상관없이, 최근 20년간 공간분석은 GIS 및 원격탐사 분야뿐만 아니라 기본적으로 공간자료를 다루는 많은 응용분야에서 공간자료의 이해와 부가정보의 생산을 위한 중요한 기술 분야로 간주되어 왔다. 공간분석의 여러 응용 분야중에서 환경분야에의 적용 연구는 또한 환경과학이라는 별도의 분야 뿐만 아니라, 기존 학문들인 지리학, 생태학, 지구과학, 사회학, 경제학, 도시 계획 등의 하위분야에서 중요한 방법론으로 자리 잡고 있다. 이 기술 세미나에서는 환경분야에 직간접적으로 활용이 가능한 공간정보 분석 기술의 동향을 지구통계학을 중심으로 소개하고자 한다. 국내에서 크리깅으로 대표되어온 지구통계학은 적용하는 학문 분야에 따라 보다 넓은 의미를 가지는 공간 통계학이라는 용어로 사용되고 있지만, 보다 학문적/기술적 의미로 살펴보면 공간분석의 특화된 분야로 간주할 수 있다. 1950년대 알려진 광상의 위치 정보를 이용하여 은둔 광상의 위치를 추정하기 위해 기본 개념이 소개된 이후에 수학적으로 이론이 1960년대 정립된 지구통계학은 많은 발전을 이루어 현재 다양한 분야에서 적용되고 있다. 그러나 외국과 달리 국내에서는 크리깅을 고급 내삽 기법으로만 간주하여 단순 주제도 작성에 제한적으로 사용하고 있다. 이 기술 세미나에서는 특정 학문분야에서 적용되기 보다는 일반적으로 통용될 수 있는 지구통계학의 기본 개념을 우선 소개한 후에, 국내외 학계에서의 환경주제도 제작과 관련된 주요 응용분야를 소개하고자 한다. 이후에 지구통계학이 적용될 수 있으면서, 다학제적 관점에서의 이슈가 될 수 있는 분야를 제시하고자 한다.

  • PDF

A Study on the Future Development of Statistics Departments : Installing Teacher-training Course (통계학과 발전방향에 대한 고찰 : 교직과정을 중심으로)

  • Chung Sung Suck;Sohn Joong-Kweon;Lee Sang Bock
    • The Korean Journal of Applied Statistics
    • /
    • v.18 no.1
    • /
    • pp.211-227
    • /
    • 2005
  • Present situation for statistics departments is in crisis in the view point of decreasing numbers of departments and difficulty in receiving good quality highschool graduate students. In this paper; we study the various ways of developing statistics department. Especially after the foreign exchange crisis in 1997, the preference for the teaching jobs in highschool is increasing drastically, the installment of teacher-training course is regraded as one of several crucial ways to get good highschool graduates and to complete with other majors at the same time.