• Title/Summary/Keyword: data selection

Search Result 5,731, Processing Time 0.038 seconds

Bayesian Model Selection for Inverse Gaussian Populations with Heterogeneity

  • Kang, Sang-Gil;Kim, Dal-Ho;Lee, Woo-Dong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.19 no.2
    • /
    • pp.621-634
    • /
    • 2008
  • This paper addresses the problem of testing whether the means in several inverse Gaussian populations with heterogeneity are equal. The analysis of reciprocals for the equality of inverse Gaussian means needs the assumption of equal scale parameters. We propose Bayesian model selection procedures for testing equality of the inverse Gaussian means under the noninformative prior without the assumption of equal scale parameters. The noninformative prior is usually improper which yields a calibration problem that makes the Bayes factor to be defined up to a multiplicative constant. So we propose the objective Bayesian model selection procedures based on the fractional Bayes factor and the intrinsic Bayes factor under the reference prior. Simulation study and real data analysis are provided.

  • PDF

Validation Comparison of Credit Rating Models Using Box-Cox Transformation

  • Hong, Chong-Sun;Choi, Jeong-Min
    • Journal of the Korean Data and Information Science Society
    • /
    • v.19 no.3
    • /
    • pp.789-800
    • /
    • 2008
  • Current credit evaluation models based on financial data make use of smoothing estimated default ratios which are transformed from each financial variable. In this work, some problems of the credit evaluation models developed by financial experts are discussed and we propose improved credit evaluation models based on the stepwise variable selection method and Box-Cox transformed data whose distribution is much skewed to the right. After comparing goodness-of-fit tests of these models, the validation of the credit evaluation models using statistical methods such as the stepwise variable selection method and Box-Cox transformation function is explained.

  • PDF

A Construction of Fuzzy Model for Data Mining (데이터 마이닝을 위한 퍼지 모델 동정)

  • Kim, Do-Wan;Park, Jin-Bae;Kim, Jung-Chan;Joo, Young-Hoon
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2002.12a
    • /
    • pp.191-194
    • /
    • 2002
  • In this paper, a new GA-based methodology with information granules is suggested for construction of the fuzzy classifier. We deal with the selection of the fuzzy region as well as two major classification problems-the feature selection and the pattern classification. The proposed method consists of three steps: the selection of the fuzzy region, the construction of the fuzzy sets, and the tuning of the fuzzy rules. The genetic algorithms (GAs) are applied to the development of the information granules so as to decide the satisfactory fuzzy regions. Finally, the GAs are also applied to the tuning procedure of the fuzzy rules in terms of the management of the misclassified data (e.g., data with the strange pattern or on the boundaries of the classes). To show the effectiveness of the proposed method, an example-the classification of the Iris data, is provided.

Spatial Coding using Data Information and Antenna Selection Technique in MIMO System (MIMO 시스템에서 데이터 정보와 안테나 선택 기법을 이용한 공간 부호화)

  • Song, Jae-Woong;Kim, Back-Hyun;Jeong, Rag-Gyo;Kwak, Kyung-Sup
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.11 no.6
    • /
    • pp.81-88
    • /
    • 2012
  • Space diversity and space multiplexing gain can be achieved with MIMO system. This paper proposes spatial coding method to MIMO system using data information and antenna selection technique. This technique provides coding gain as well as space diversity gain. For MIMO system with BPSK modulation, BER performance is analyzed and space diversity gains are compared through simulation in terms of data maldistribution degree.

How to improve oil consumption forecast using google trends from online big data?: the structured regularization methods for large vector autoregressive model

  • Choi, Ji-Eun;Shin, Dong Wan
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.1
    • /
    • pp.41-51
    • /
    • 2022
  • We forecast the US oil consumption level taking advantage of google trends. The google trends are the search volumes of the specific search terms that people search on google. We focus on whether proper selection of google trend terms leads to an improvement in forecast performance for oil consumption. As the forecast models, we consider the least absolute shrinkage and selection operator (LASSO) regression and the structured regularization method for large vector autoregressive (VAR-L) model of Nicholson et al. (2017), which select automatically the google trend terms and the lags of the predictors. An out-of-sample forecast comparison reveals that reducing the high dimensional google trend data set to a low-dimensional data set by the LASSO and the VAR-L models produces better forecast performance for oil consumption compared to the frequently-used forecast models such as the autoregressive model, the autoregressive distributed lag model and the vector error correction model.

Secure Cluster Selection in Autonomous Vehicular Networks

  • Mohammed, Alkhathami
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.1
    • /
    • pp.11-16
    • /
    • 2023
  • Vehicular networks are part of the next generation wireless and smart Intelligent Transportation Systems (ITS). In the future, autonomous vehicles will be an integral part of ITS and will provide safe and reliable traveling features to the users. The reliability and security of data transmission in vehicular networks has been a challenging task. To manage data transmission in vehicular networks, road networks are divided into clusters and a cluster head is selected to handle the data. The selection of cluster heads is a challenge as vehicles are mobile and their connectivity is dynamically changing. In this paper, a novel secure cluster head selection algorithm is proposed for secure and reliable data sharing. The idea is to use the secrecy rate of each vehicle in the cluster and adaptively select the most secure vehicle as the cluster head. Simulation results show that the proposed scheme improves the reliability and security of the transmission significantly.

On variable bandwidth Kernel Regression Estimation (변수평활량을 이용한 커널회귀함수 추정)

  • Seog, Kyung-Ha;Chung, Sung-Suk;Kim, Dae-Hak
    • Journal of the Korean Data and Information Science Society
    • /
    • v.9 no.2
    • /
    • pp.179-188
    • /
    • 1998
  • Local polynomial regression estimation is the most popular one among kernel type regression estimator. In local polynomial regression function esimation bandwidth selection is crucial problem like the kernel estimation. When the regression curve has complicated structure variable bandwidth selection will be appropriate. In this paper, we propose a variable bandwidth selection method fully data driven. We will choose the bandwdith by selecting minimising estiamted MSE which is estimated by the pilot bandwidth study via croos-validation method. Monte carlo simulation was conducted in order to show the superiority of proposed bandwidth selection method.

  • PDF

Comparing Classification Accuracy of Ensemble and Clustering Algorithms Based on Taguchi Design (다구찌 디자인을 이용한 앙상블 및 군집분석 분류 성능 비교)

  • Shin, Hyung-Won;Sohn, So-Young
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.27 no.1
    • /
    • pp.47-53
    • /
    • 2001
  • In this paper, we compare the classification performances of both ensemble and clustering algorithms (Data Bagging, Variable Selection Bagging, Parameter Combining, Clustering) to logistic regression in consideration of various characteristics of input data. Four factors used to simulate the logistic model are (1) correlation among input variables (2) variance of observation (3) training data size and (4) input-output function. In view of the unknown relationship between input and output function, we use a Taguchi design to improve the practicality of our study results by letting it as a noise factor. Experimental study results indicate the following: When the level of the variance is medium, Bagging & Parameter Combining performs worse than Logistic Regression, Variable Selection Bagging and Clustering. However, classification performances of Logistic Regression, Variable Selection Bagging, Bagging and Clustering are not significantly different when the variance of input data is either small or large. When there is strong correlation in input variables, Variable Selection Bagging outperforms both Logistic Regression and Parameter combining. In general, Parameter Combining algorithm appears to be the worst at our disappointment.

  • PDF

A comparative study of filter methods based on information entropy

  • Kim, Jung-Tae;Kum, Ho-Yeun;Kim, Jae-Hwan
    • Journal of Advanced Marine Engineering and Technology
    • /
    • v.40 no.5
    • /
    • pp.437-446
    • /
    • 2016
  • Feature selection has become an essential technique to reduce the dimensionality of data sets. Many features are frequently irrelevant or redundant for the classification tasks. The purpose of feature selection is to select relevant features and remove irrelevant and redundant features. Applications of the feature selection range from text processing, face recognition, bioinformatics, speaker verification, and medical diagnosis to financial domains. In this study, we focus on filter methods based on information entropy : IG (Information Gain), FCBF (Fast Correlation Based Filter), and mRMR (minimum Redundancy Maximum Relevance). FCBF has the advantage of reducing computational burden by eliminating the redundant features that satisfy the condition of approximate Markov blanket. However, FCBF considers only the relevance between the feature and the class in order to select the best features, thus failing to take into consideration the interaction between features. In this paper, we propose an improved FCBF to overcome this shortcoming. We also perform a comparative study to evaluate the performance of the proposed method.

A Design of an Optimized Classifier based on Feature Elimination for Gene Selection (유전자 선택을 위해 속성 삭제에 기반을 둔 최적화된 분류기 설계)

  • Lee, Byung-Kwan;Park, Seok-Gyu;Tifani, Yusrina
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.8 no.5
    • /
    • pp.384-393
    • /
    • 2015
  • This paper proposes an optimized classifier based on feature elimination (OCFE) for gene selection with combining two feature elimination methods, ReliefF and SVM-RFE. ReliefF algorithm is filter feature selection which rank the data by the importance of the data. SVM-RFE algorithm is a wrapper feature selection which wrapped the data and rank the data based on the weight of feature. With combining these two methods we get less error rate average, 0.3016138 for OCFE and 0.3096779 for SVM-RFE. The proposed method also get better accuracy with 70% for OCFE and 69% for SVM-RFE.