• Title/Summary/Keyword: 랜덤모형

Search Result 233, Processing Time 0.023 seconds

Bike Insurance Fraud Detection Model Using Balanced Randomforest Algorithm (균형 랜덤 포레스트를 이용한 이륜차 보험사기 적발 모형 개발)

  • Kim, Seunghoon;Lee, Soo Il;Kim, Tae ho
    • Journal of Digital Convergence
    • /
    • v.20 no.2
    • /
    • pp.241-250
    • /
    • 2022
  • Due to the COVID-19 pandemic, with increased 'untact' services and with unstable household economy, the bike insurance fraud is expected to surge. Moreover, the fraud methodology gets complicated. However, the fraud detection model for bike insurance is absent. we deal with the issue of skewed class distribution and reflect the criterion of fraud detection expert. We utilize a balanced random-forest algorithm to develop an efficient bike insurance fraud detection model. As a result, while the predictive performance of balanced random-forest model is superior than it of non-balanced model. There is no significant difference between the variables used by the experts and the confirmatory models. The important variables to detect frauds are turned out to be age and gender of driver, correspondence between insured and driver, the amount of self-repairing claim, and the amount of bodily injury liability.

Nonparametric multiple comparison method using aligned method and joint placement in randomized block design with replications (반복이 있는 랜덤화 블록 모형에서 정렬방법과 결합위치를 이용한 비모수 다중비교법)

  • Hwang, Juwon;Kim, Dongjae
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.5
    • /
    • pp.599-610
    • /
    • 2018
  • The method of Mack and Skillings (Technometrics, 23, 171-177, 1981) is a nonparametric multiple comparison method in a randomized block design with replications. This method is likely to result in loss of information because each block is ranked using the average of observations instead of repeated observations. In this paper, we proposed a new nonparametric multiple comparison method in the randomized block model with replications using an alignment method proposed by Hodges and Lehmann (The Annals of Mathematical Statistics, 33, 482-497, 1962) that extend the joint placement method proposed by Chung and Kim (Communications for Statistical Applications and Methods, 14, 551-560, 2007). In addition, Monte Carlo simulation compared the family wise error rate and power with the parametric method and the nonparametric method.

완전확률화모형 및 랜덤화블럭모형하에서 순위변환을 이용한 다중비교의 시뮬레이션 분석

  • 최영훈
    • Communications for Statistical Applications and Methods
    • /
    • v.5 no.1
    • /
    • pp.85-97
    • /
    • 1998
  • 완전확률화모형 및 랜덤화블럭모형하에서의 주요한 다중비교 분석기법들을 시뮬레이션을 이용하여 검토하고자 하였다. 시뮬레이션 결과는 순위변환과 최소유의차검정을 이용한 다중비교 분석기법이 모수적 ANOVA F 검정과 Fisher의 유의차검정, 비모수적 Kruskal-Wallis 검정과 최소유의차검정 및 Friedman 검정과 최소유의차검정을 이용한 분석기법보다 전체실험오차율, 전체실험검정력 및 개별쌍검정력 면에서 상대적으로 뛰어남을 보여준다. 즉 순위변환한 ANOVA F 검정의 전체실험오차율은 명목상의 유의수준을 잘 유지하고 있으며, 전체실험검정력 및 개별쌍검정력은 모수적 ANOVA F 검정과 Kruskal-Wallis 검정 및 Friedman 검정기법보다 전반적으로 우수함을 알 수 있다.

  • PDF

Nonparametric procedures using aligned method and joint placement in randomized block design (랜덤화 블록 계획법에서 정렬방법과 결합 위치를 이용한 비모수 검정법)

  • Jo, Sungdong;Kim, Dongjae
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.1
    • /
    • pp.95-103
    • /
    • 2013
  • Nonparametric procedure in randomized block design (RBD) was proposed by Friedman (1937) for general alternatives. Also Page (1963) suggested the test for ordered alternatives in RBD. In this paper, we proposed the new nonparametric method in randomized block design using aligned method suggested by Hodges and Lehmann (1962) and the joint placement described in Chung and Kim (2007). Also, Monte Carlo simulation study was adapted to compare the power of the proposed procedure with those of previous procedure.

Nonparametric procedures based on aligned method and placement for ordered alternatives in randomized block design (랜덤화 블록 모형에서 정렬방법과 위치를 이용한 순서형 대립가설에 대한 비모수 검정법)

  • Kim, Hyosook;Kim, Dongjae
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.4
    • /
    • pp.707-717
    • /
    • 2016
  • Nonparametric procedures in a randomized block design was proposed by Friedman (1937) as a general alternative as well as suggested as a test for ordered alternatives by Page (1963). These methods are used for the rank of treatments in each block. In this paper, we proposed nonparametric procedures using aligned method proposed by Hodges and Lehmann (1962) to reduce among block information and based on placement suggested by Kim (1999) in a randomized block design. We also perform a Monte Carlo study to compare the empirical powers of the proposed procedures and established method.

Predicting Snow Damage and Suggesting Improvement Plans Using Deep Learning (딥러닝을 이용한 대설피해액 예측 및 개선방안 제안)

  • Lee, HyeongJoo;Chung, Gunhui
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2021.06a
    • /
    • pp.485-485
    • /
    • 2021
  • 최근 세계적인 기상이변으로 자연재해의 발생빈도 증가는 물론 이로 인한 피해가 점차 다양화 및 대형화되어 가고 있는 추세이다. 재난으로 인한 피해는 발생지역 피해뿐만 아니라 국가 경제 전반에 큰 영향을 미치는 특징이 있다. 우리나라의 자연재해 중 대설은 다른 자연재해에 비해 발생빈도는 낮지만 광역적인 피해를 유발하며, 피해 면적에 비해 피해액 규모가 크다. 또한 현재에는 강원권이 가장 취약한 것으로 취약성 분석 결과에서 보여주지만, 미래에는 강원권, 충청권, 호남권을 연결하는 축으로 취약지역이 확대될 것으로 전망된다. 본 연구에서는 현재 사회 전반에서 다양하게 활용되고 있는 머신러닝 기법을 이용하여 우리나라 대설피해액을 예측하는 대설피해 예측모형을 개발하고자 하였다. 머신러닝 기법으로는 랜덤포레스트, 서포트 벡터 머신, 인공신경망 기법을 이용하였고, 모형에 사용한 변수는 기상관측자료, 사회·경제적 요소 등을 활용하여 모형을 개발하였다. 결과적으로 기존연구에서 다중회귀모형을 이용하여 개발된 예측모형과 본 연구에서 3개의 머신러닝 기법으로 개발된 예측모형의 예측력을 비교 분석하였고, 예측력이 가장 높은 모형을 제시하였다. 본 연구결과를 활용하여 모형의 개선 및 데이터 품질 개선이 이루어진다면 향후 대설피해에 대한 개략적인 대비가 가능할 것으로 기대된다.

  • PDF

Nonparametric method using aligned method and linear placement statistics in randomized block design with replications (반복이 있는 랜덤화블록 모형에서 정렬방법과 선형위치통계량을 이용한 비모수 검정법)

  • Jeon, Soyoung;Kim, Dongjae
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.2
    • /
    • pp.281-290
    • /
    • 2017
  • Mack and Skillings (1980) proposed a nonparametric method in a randomized block design with replications. This method employs the mean of observations instead of each observation. However, it has the inherent disadvantage that there may be a loss of information. In this paper, we proposed a nonparametric method that employees an aligned method and linear placement statistics to supplement its weakness. A Monte-Carlo study is performed to compare the power of the proposed method with previous methods.

Machine learning model for residual chlorine prediction in sediment basin to control pre-chlorination in water treatment plant (정수장 전염소 공정제어를 위한 침전지 잔류염소농도 예측 머신러닝 모형)

  • Kim, Juhwan;Lee, Kyunghyuk;Kim, Soojun;Kim, Kyunghun
    • Journal of Korea Water Resources Association
    • /
    • v.55 no.spc1
    • /
    • pp.1283-1293
    • /
    • 2022
  • The purpose of this study is to predict residual chlorine in order to maintain stable residual chlorine concentration in sedimentation basin by using artificial intelligence algorithms in water treatment process employing pre-chlorination. Available water quantity and quality data are collected and analyzed statistically to apply into mathematical multiple regression and artificial intelligence models including multi-layer perceptron neural network, random forest, long short term memory (LSTM) algorithms. Water temperature, turbidity, pH, conductivity, flow rate, alkalinity and pre-chlorination dosage data are used as the input parameters to develop prediction models. As results, it is presented that the random forest algorithm shows the most moderate prediction result among four cases, which are long short term memory, multi-layer perceptron, multiple regression including random forest. Especially, it is result that the multiple regression model can not represent the residual chlorine with the input parameters which varies independently with seasonal change, numerical scale and dimension difference between quantity and quality. For this reason, random forest model is more appropriate for predict water qualities than other algorithms, which is classified into decision tree type algorithm. Also, it is expected that real time prediction by artificial intelligence models can play role of the stable operation of residual chlorine in water treatment plant including pre-chlorination process.

Investigating Dynamic Mutation Process of Issues Using Unstructured Text Analysis (부도예측을 위한 KNN 앙상블 모형의 동시 최적화)

  • Min, Sung-Hwan
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.139-157
    • /
    • 2016
  • Bankruptcy involves considerable costs, so it can have significant effects on a country's economy. Thus, bankruptcy prediction is an important issue. Over the past several decades, many researchers have addressed topics associated with bankruptcy prediction. Early research on bankruptcy prediction employed conventional statistical methods such as univariate analysis, discriminant analysis, multiple regression, and logistic regression. Later on, many studies began utilizing artificial intelligence techniques such as inductive learning, neural networks, and case-based reasoning. Currently, ensemble models are being utilized to enhance the accuracy of bankruptcy prediction. Ensemble classification involves combining multiple classifiers to obtain more accurate predictions than those obtained using individual models. Ensemble learning techniques are known to be very useful for improving the generalization ability of the classifier. Base classifiers in the ensemble must be as accurate and diverse as possible in order to enhance the generalization ability of an ensemble model. Commonly used methods for constructing ensemble classifiers include bagging, boosting, and random subspace. The random subspace method selects a random feature subset for each classifier from the original feature space to diversify the base classifiers of an ensemble. Each ensemble member is trained by a randomly chosen feature subspace from the original feature set, and predictions from each ensemble member are combined by an aggregation method. The k-nearest neighbors (KNN) classifier is robust with respect to variations in the dataset but is very sensitive to changes in the feature space. For this reason, KNN is a good classifier for the random subspace method. The KNN random subspace ensemble model has been shown to be very effective for improving an individual KNN model. The k parameter of KNN base classifiers and selected feature subsets for base classifiers play an important role in determining the performance of the KNN ensemble model. However, few studies have focused on optimizing the k parameter and feature subsets of base classifiers in the ensemble. This study proposed a new ensemble method that improves upon the performance KNN ensemble model by optimizing both k parameters and feature subsets of base classifiers. A genetic algorithm was used to optimize the KNN ensemble model and improve the prediction accuracy of the ensemble model. The proposed model was applied to a bankruptcy prediction problem by using a real dataset from Korean companies. The research data included 1800 externally non-audited firms that filed for bankruptcy (900 cases) or non-bankruptcy (900 cases). Initially, the dataset consisted of 134 financial ratios. Prior to the experiments, 75 financial ratios were selected based on an independent sample t-test of each financial ratio as an input variable and bankruptcy or non-bankruptcy as an output variable. Of these, 24 financial ratios were selected by using a logistic regression backward feature selection method. The complete dataset was separated into two parts: training and validation. The training dataset was further divided into two portions: one for the training model and the other to avoid overfitting. The prediction accuracy against this dataset was used to determine the fitness value in order to avoid overfitting. The validation dataset was used to evaluate the effectiveness of the final model. A 10-fold cross-validation was implemented to compare the performances of the proposed model and other models. To evaluate the effectiveness of the proposed model, the classification accuracy of the proposed model was compared with that of other models. The Q-statistic values and average classification accuracies of base classifiers were investigated. The experimental results showed that the proposed model outperformed other models, such as the single model and random subspace ensemble model.

A Comparison of Estimation in an Unbalanced Linear Mixed Model (불균형 선형혼합모형에서 추정량)

  • 송석헌;정병철
    • The Korean Journal of Applied Statistics
    • /
    • v.15 no.2
    • /
    • pp.337-354
    • /
    • 2002
  • This paper derives three estimation methods for the between group variance component for serially correlated random model. To compare their estimation capability, three designs having different degree of unbalancedness are considered. The so-called empirical quantile dispersion graphs(EQDGs) used to compare estimation methods as well as designs. The proposed conditional ANOVA estimation is robust for design unbalancedness, however, ML estimation is preferred to the conditional AOVA and REML estimation regardless of design unbalancedness and correlation coefficient.