• Title/Summary/Keyword: statistics based method

Search Result 2,144, Processing Time 0.028 seconds

Genetic Programming Based Compensation Technique for Short-range Temperature Prediction (유전 프로그래밍 기반 단기 기온 예보의 보정 기법)

  • Hyeon, Byeong-Yong;Hyun, Soo-Hwan;Lee, Yong-Hee;Seo, Ki-Sung
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.61 no.11
    • /
    • pp.1682-1688
    • /
    • 2012
  • This paper introduces a GP(Genetic Programming) based robust technique for temperature compensation in short-range prediction. Development of an efficient MOS(Model Output Statistics) is necessary to correct systematic errors of the model, because forecast models do not reliably determine weather conditions. Most of MOS use a linear regression to compensate a prediction model, therefore it is hard to manage an irregular nature of prediction. In order to solve the problem, a nonlinear and symbolic regression method using GP is suggested. The purpose of this study is to evaluate the accuracy of the estimation by a GP based nonlinear MOS for 3 days temperatures in Korean regions. This method is then compared to the UM model and has shown superior results. The training period of 2007-2009 summer is used, and the data of 2010 summer is adopted for verification.

Pruning the Boosting Ensemble of Decision Trees

  • Yoon, Young-Joo;Song, Moon-Sup
    • Communications for Statistical Applications and Methods
    • /
    • v.13 no.2
    • /
    • pp.449-466
    • /
    • 2006
  • We propose to use variable selection methods based on penalized regression for pruning decision tree ensembles. Pruning methods based on LASSO and SCAD are compared with the cluster pruning method. Comparative studies are performed on some artificial datasets and real datasets. According to the results of comparative studies, the proposed methods based on penalized regression reduce the size of boosting ensembles without decreasing accuracy significantly and have better performance than the cluster pruning method. In terms of classification noise, the proposed pruning methods can mitigate the weakness of AdaBoost to some degree.

Simultaneous outlier detection and variable selection via difference-based regression model and stochastic search variable selection

  • Park, Jong Suk;Park, Chun Gun;Lee, Kyeong Eun
    • Communications for Statistical Applications and Methods
    • /
    • v.26 no.2
    • /
    • pp.149-161
    • /
    • 2019
  • In this article, we suggest the following approaches to simultaneous variable selection and outlier detection. First, we determine possible candidates for outliers using properties of an intercept estimator in a difference-based regression model, and the information of outliers is reflected in the multiple regression model adding mean shift parameters. Second, we select the best model from the model including the outlier candidates as predictors using stochastic search variable selection. Finally, we evaluate our method using simulations and real data analysis to yield promising results. In addition, we need to develop our method to make robust estimates. We will also to the nonparametric regression model for simultaneous outlier detection and variable selection.

Intelligent User Pattern Recognition based on Vision, Audio and Activity for Abnormal Event Detections of Single Households

  • Jung, Ju-Ho;Ahn, Jun-Ho
    • Journal of the Korea Society of Computer and Information
    • /
    • v.24 no.5
    • /
    • pp.59-66
    • /
    • 2019
  • According to the KT telecommunication statistics, people stayed inside their houses on an average of 11.9 hours a day. As well as, according to NSC statistics in the united states, people regardless of age are injured for a variety of reasons in their houses. For purposes of this research, we have investigated an abnormal event detection algorithm to classify infrequently occurring behaviors as accidents, health emergencies, etc. in their daily lives. We propose a fusion method that combines three classification algorithms with vision pattern, audio pattern, and activity pattern to detect unusual user events. The vision pattern algorithm identifies people and objects based on video data collected through home CCTV. The audio and activity pattern algorithms classify user audio and activity behaviors using the data collected from built-in sensors on their smartphones in their houses. We evaluated the proposed individual pattern algorithm and fusion method based on multiple scenarios.

An Object-Based Verification Method for Microscale Weather Analysis Module: Application to a Wind Speed Forecasting Model for the Korean Peninsula (미기상해석모듈 출력물의 정확성에 대한 객체기반 검증법: 한반도 풍속예측모형의 정확성 검증에의 응용)

  • Kim, Hea-Jung;Kwak, Hwa-Ryun;Kim, Sang-il;Choi, Young-Jean
    • The Korean Journal of Applied Statistics
    • /
    • v.28 no.6
    • /
    • pp.1275-1288
    • /
    • 2015
  • A microscale weather analysis module (about 1km or less) is a microscale numerical weather prediction model designed for operational forecasting and atmospheric research needs such as radiant energy, thermal energy, and humidity. The accuracy of the module is directly related to the usefulness and quality of real-time microscale weather information service in the metropolitan area. This paper suggests an object based verification method useful for spatio-temporal evaluation of the accuracy of the microscale weather analysis module. The method is a graphical method comprised of three steps that constructs a lattice field of evaluation statistics, merges and identifies objects, and evaluates the accuracy of the module. We develop lattice fields using various evaluation spatio-temporal statistics as well as an efficient object identification algorithm that conducts convolution, masking, and merging operations to the lattice fields. A real data application demonstrates the utility of the verification method.

A Study on Performance and Prediction Factors in College and University Libraries using Statistical Analyses (대학도서관 통계분석을 통한 대학도서관 성과 및 영향요인에 대한 연구)

  • Kim, Giyeong;Choi, Yoonhee;Kang, Jaeyeon;Go, Pyeongjin
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.25 no.3
    • /
    • pp.191-214
    • /
    • 2014
  • The goal of this study is an exploratory statistical analysis of the university and college library statistics in the Academic Information Statistics System(rinfo.kr) governed of Korean Education and Research Information Service(KERIS) with performance measures based on sustainability. For the goal, we adopt a preprocessing method to develop change-rate variables by considering preceding predictive elements and succeeding performance elements, and to control external factors, such as size and socioeconomic factors. Then we execute a series of factor analyses and multiple linear regression analyses. 13 factors are extracted by the factor analyses and some sets of significant variables affecting the performance measures are identified through the regression analyses. Based on the results, we discuss the problem of out-lier and low correlation between variables. A suggestion for developing new variables is also discussed based on low effect sizes of the developed regression models. We hope that this study contributes to diffuse discussions on statistics system, evaluation, and further library management based on sustainability.

Gene Selection Based on Support Vector Machine using Bootstrap (붓스트랩 방법을 활용한 SVM 기반 유전자 선택 기법)

  • Song, Seuck-Heun;Kim, Kyoung-Hee;Park, Chang-Yi;Koo, Ja-Yong
    • The Korean Journal of Applied Statistics
    • /
    • v.20 no.3
    • /
    • pp.531-540
    • /
    • 2007
  • The recursive feature elimination for support vector machine is known to be useful in selecting relevant genes. Since the criterion for choosing relevant genes is the absolute value of a coefficient, the recursive feature elimination may suffer from a scaling problem. We propose a modified version of the recursive feature elimination algorithm using bootstrap. In our method, the criterion for determining relevant genes is the absolute value of a coefficient divided by its standard error, which accounts for statistical variability of the coefficient. Through numerical examples, we illustrate that our method is effective in gene selection.

Study Gene Interaction Effect Based on Expanded Multifactor Dimensionality Reduction Algorithm (확장된 다중인자 차원축소 (E-MDR) 알고리즘에 기반한 유전자 상호작용 효과 규명)

  • Lee, Jea-Young;Lee, Ho-Guen;Lee, Yong-Won
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.6
    • /
    • pp.1239-1247
    • /
    • 2009
  • Study the gene about economical characteristic of human disease or domestic animal is a matter of grave interest, preserve and elevation of gene of Korea cattle is key subject. Studies have been done on the gene of Korea cattle using EST based SNP map, but it is based on statistical model, therefore there are difference between real position and statistical position. These problems are solved using both EST_based SNP map and Gene on sequence by Lee et al. (2009b). We have used multifactor dimensionality reduction(MDR) method to study interaction effect of statistical model in general. But MDR method cannot be applied in all cases. It can be applied to the only case-control data. So, method is suggested E-MDR method using CART algorithm. Also we identified interaction effects of single nucleotide polymorphisms(SNPs) responsible for average daily gain(ADG) and marbling score(MS) using E-MDR method.

Detection of Hotspots for Geospatial Lattice Data

  • Moon, Sung-Ho;Kim, Jong-Duk
    • Journal of the Korean Data and Information Science Society
    • /
    • v.17 no.1
    • /
    • pp.131-139
    • /
    • 2006
  • Statistical analyses for spatial data are important features for various types of fields. Spatial data are taken at specific locations or within specific regions and their relative positions are recorded. Lattice data are synoptic observation covering an entire spatial region, like cancer rates corresponding to each county in a state. The main purpose of this paper is to detect hotspots for the region with significantly high or low rates. Kulldorff(1997) detected hotspots based on circular spatial scan statistics. We propose a new method to find any shapes of hotspots by use of echelon analysis with spatial scan statistics.

  • PDF

A new extension of Lindley distribution: modified validation test, characterizations and different methods of estimation

  • Ibrahim, Mohamed;Yadav, Abhimanyu Singh;Yousof, Haitham M.;Goual, Hafida;Hamedani, G.G.
    • Communications for Statistical Applications and Methods
    • /
    • v.26 no.5
    • /
    • pp.473-495
    • /
    • 2019
  • In this paper, a new extension of Lindley distribution has been introduced. Certain characterizations based on truncated moments, hazard and reverse hazard function, conditional expectation of the proposed distribution are presented. Besides, these characterizations, other statistical/mathematical properties of the proposed model are also discussed. The estimation of the parameters is performed through different classical methods of estimation. Bayes estimation is computed under gamma informative prior under the squared error loss function. The performances of all estimation methods are studied via Monte Carlo simulations in mean square error sense. The potential of the proposed model is analyzed through two data sets. A modified goodness-of-fit test using the Nikulin-Rao-Robson statistic test is investigated via two examples and is observed that the new extension might be used as an alternative lifetime model.