• Title/Summary/Keyword: Overfit

Search Result 14, Processing Time 0.026 seconds

A Study on Development of Economic Instability Index

  • Do, Jong-Doo;Song, Gyu-Moon;Kim, Tae-Yoon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.15 no.2
    • /
    • pp.355-365
    • /
    • 2004
  • Kim et al.. (2003) developed an Economic Instability Index (EII) by using mean squared error (MSE) from the neural network (NN) trained on the 1995 KOSPI. In this paper we study validity of the NN. For this we compare the NN with the well known Box-Jenkins linear auto-regressive processes. Our conclusive understanding of the problem is that the NN provides quite effective EII because it tends to overfit.

  • PDF

A Study on the Features of Writing Rater in TOPIK Writing Assessment (한국어능력시험(TOPIK) 쓰기 평가의 채점 특성 연구)

  • Ahn, Su-hyun;Kim, Chung-sook
    • Journal of Korean language education
    • /
    • v.28 no.1
    • /
    • pp.173-196
    • /
    • 2017
  • Writing is a subjective and performative activity. Writing ability has multi-facets and compoundness. To understand the examinees's writing ability accurately and provide effective writing scores, raters first ought to have the competency regarding assessment. Therefore, this study is significant as a fundamental research about rater's characteristics on the TOPIK writing assessment. 150 scripts of the 47th TOPIK examinees were selected randomly, and were further rated independently by 20 raters. The many-facet Rasch model was used to generate individualized feedback reports on each rater's relative severity and consistency with respect to particular categories of the rating scale. This study was analyzed using the FACETS ver 3.71.4 program. Overfit and misfit raters showed many difficulties for noticing the difference between assessment factors and interpreting the criteria. Writing raters appear to have much confusion when interpreting the assessment criteria, and especially, overfit and misfit teachers interpret the criteria arbitrarily. The main reason of overfit and misfit is the confusion about assessment factors and criteria in finding basis for scoring. Therefore, there needs to be more training and research is needed for raters based on this type of writing assessment characteristics. This study is recognized significantly in that it collectively examined writing assessment characteristics of writing raters, and visually confirmed the assessment error aspects of writing assessment.

Optimal EEG Channel Selection using BPSO with Channel Impact Factor (Channel Impact Factor 접목한 BPSO 기반 최적의 EEG 채널 선택 기법)

  • Kim, Jun-Yeup;Park, Seung-Min;Ko, Kwang-Eun;Sim, Kwee-Bo
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.22 no.6
    • /
    • pp.774-779
    • /
    • 2012
  • Brain-computer interface based on motor imagery is a system that transforms a subject's intention into a control signal by classifying EEG signals obtained from the imagination of movement of a subject's limbs. For the new paradigm, we do not know which positions are activated or not. A simple approach is to use as many channels as possible. The problem is that using many channels causes other problems. When applying a common spatial pattern (CSP), which is an EEG extraction method, many channels cause an overfit problem, in addition there is difficulty using this technique for medical analysis. To overcome these problems, we suggest a binary particle swarm optimization with channel impact factor in order to select channels close to the most important channels as channel selection method. This paper examines whether or not channel impact factor can improve accuracy by Support Vector Machine(SVM).

Split Effect in Ensemble

  • Chung, Dong-Jun;Kim, Hyun-Joong
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2005.11a
    • /
    • pp.193-197
    • /
    • 2005
  • Classification tree is one of the most suitable base learners for ensemble. For past decade, it was found that bagging gives the most accurate prediction when used with unpruned tree and boosting with stump. Researchers have tried to understand the relationship between the size of trees and the accuracy of ensemble. With experiment, it is found that large trees make boosting overfit the dataset and stumps help avoid it. It means that the accuracy of each classifier needs to be sacrificed for better weighting at each iteration. Hence, split effect in boosting can be explained with the trade-off between the accuracy of each classifier and better weighting on the misclassified points. In bagging, combining larger trees give more accurate prediction because bagging does not have such trade-off, thus it is advisable to make each classifier as accurate as possible.

  • PDF

A Study on Customer Segmentation Prediction Model using Support Vector Machine (Support Vector Machine을 이용한 고객이탈 예측모형에 관한 연구)

  • Seo Kwang Kyu
    • Journal of the Korea Safety Management & Science
    • /
    • v.7 no.1
    • /
    • pp.199-210
    • /
    • 2005
  • Customer segmentation prediction has attracted a lot of research interests in previous literature, and recent studies have shown that artificial neural networks (ANN) method achieved better performance than traditional statistical ones. However, ANN approaches have suffered from difficulties with generalization, producing models that can overfit the data. This paper employs a relatively new machine learning technique, support vector machines (SVM), to the customer segmentation prediction problem in an attempt to provide a model with better explanatory power. To evaluate the prediction accuracy of SVM, we compare its performance with logistic regression analysis and ANN. The experiment results with real data of insurance company show that SVM superiors to them.

A Pansharpening Algorithm of KOMPSAT-3A Satellite Imagery by Using Dilated Residual Convolutional Neural Network (팽창된 잔차 합성곱신경망을 이용한 KOMPSAT-3A 위성영상의 융합 기법)

  • Choi, Hoseong;Seo, Doochun;Choi, Jaewan
    • Korean Journal of Remote Sensing
    • /
    • v.36 no.5_2
    • /
    • pp.961-973
    • /
    • 2020
  • In this manuscript, a new pansharpening model based on Convolutional Neural Network (CNN) was developed. Dilated convolution, which is one of the representative convolution technologies in CNN, was applied to the model by making it deep and complex to improve the performance of the deep learning architecture. Based on the dilated convolution, the residual network is used to enhance the efficiency of training process. In addition, we consider the spatial correlation coefficient in the loss function with traditional L1 norm. We experimented with Dilated Residual Networks (DRNet), which is applied to the structure using only a panchromatic (PAN) image and using both a PAN and multispectral (MS) image. In the experiments using KOMPSAT-3A, DRNet using both a PAN and MS image tended to overfit the spectral characteristics, and DRNet using only a PAN image showed a spatial resolution improvement over existing CNN-based models.

The Study on the BTS's Fashion Style (방탄소년단의 패션 스타일에 관한 연구)

  • Kim, Jang-Hyeon
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.21 no.9
    • /
    • pp.310-320
    • /
    • 2020
  • BTS has established itself as a leading Korean popular music group in the global mainstream music market. In addition, BTS's fashion style is receiving so much attention that it is featured in the social networks of foreign media and the public. The purpose of this study considers BTS' fashion trends based on the analysis of their fashion styles. The methods and scope of this study have been combined with theoretical studies related to the fashion impact of BTS and BTS, as well as content analysis studies based on the image of BTS over the last five years. The results of this study are as follows. First, it was found that they pursue classic and modern styles through suits that emphasize straight silhouettes, the use of achromatic colors, and simplified decorations. Second, it has been shown that BTS prefers a casual style of free-spirited and comfortable sensibility by matching a jacket with a round shoulder line, toned-down skinny jeans, hood T-shirts, lettering patterns, and vivid colors. Third, BTS pursues a dynamic and active sporty style by utilizing the sleeveless basketball shirts, round neckline baseball shirts and shorts, training pants, and overfit sweatshirts with an emphasis on lettering patterns.

Comparison Study of Kernel Density Estimation according to Various Bandwidth Selectors (다양한 대역폭 선택법에 따른 커널밀도추정의 비교 연구)

  • Kang, Young-Jin;Noh, Yoojeong
    • Journal of the Computational Structural Engineering Institute of Korea
    • /
    • v.32 no.3
    • /
    • pp.173-181
    • /
    • 2019
  • To estimate probabilistic distribution function from experimental data, kernel density estimation(KDE) is mostly used in cases when data is insufficient. The estimated distribution using KDE depends on bandwidth selectors that smoothen or overfit a kernel estimator to experimental data. In this study, various bandwidth selectors such as the Silverman's rule of thumb, rule using adaptive estimates, and oversmoothing rule, were compared for accuracy and conservativeness. For this, statistical simulations were carried out using assumed true models including unimodal and multimodal distributions, and, accuracies and conservativeness of estimating distribution functions were compared according to various data. In addition, it was verified how the estimated distributions using KDE with different bandwidth selectors affect reliability analysis results through simple reliability examples.

Segment unit shuffling layer in deep neural networks for text-independent speaker verification (문장 독립 화자 인증을 위한 세그멘트 단위 혼합 계층 심층신경망)

  • Heo, Jungwoo;Shim, Hye-jin;Kim, Ju-ho;Yu, Ha-Jin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.40 no.2
    • /
    • pp.148-154
    • /
    • 2021
  • Text-Independent speaker verification needs to extract text-independent speaker embedding to improve generalization performance. However, deep neural networks that depend on training data have the potential to overfit text information instead of learning the speaker information when repeatedly learning from the identical time series. In this paper, to prevent the overfitting, we propose a segment unit shuffling layer that divides and rearranges the input layer or a hidden layer along the time axis, thus mixes the time series information. Since the segment unit shuffling layer can be applied not only to the input layer but also to the hidden layers, it can be used as generalization technique in the hidden layer, which is known to be effective compared to the generalization technique in the input layer, and can be applied simultaneously with data augmentation. In addition, the degree of distortion can be adjusted by adjusting the unit size of the segment. We observe that the performance of text-independent speaker verification is improved compared to the baseline when the proposed segment unit shuffling layer is applied.

gMLP-based Self-Supervised Learning Anomaly Detection using a Simple Synthetic Data Generation Method (단순한 합성데이터 생성 방식을 활용한 gMLP 기반 자기 지도 학습 이상탐지 기법)

  • Ju-Hyo, Hwang;Kyo-Hong, Jin
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.27 no.1
    • /
    • pp.8-14
    • /
    • 2023
  • The existing self-supervised learning-based CutPaste generated synthetic data by cutting and attaching specific patches from normal images and then performed anomaly detection. However, this method has a problem in that there is a clear difference in the boundary of the patch. NSA for solving these problems have achieved higher anomaly detection performance by generating natural synthetic data through Poisson Blending. However, NSA has the disadvantage of having many hyperparameters that need to be adjusted for each class. In this paper, synthetic data similar to normal were generated by a simple method of making the size of the synthetic patch very small. At this time, since the patches are so locally synthesized, models that learn local features can easily overfit synthetic data. Therefore, we performed anomaly detection using gMLP, which learns global features, and even with simple synthesis methods, we were able to achieve higher performance than conventional self-supervised learning techniques.