• Title/Summary/Keyword: Random selection

Search Result 638, Processing Time 0.026 seconds

Improving an Ensemble Model Using Instance Selection Method (사례 선택 기법을 활용한 앙상블 모형의 성능 개선)

  • Min, Sung-Hwan
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.39 no.1
    • /
    • pp.105-115
    • /
    • 2016
  • Ensemble classification involves combining individually trained classifiers to yield more accurate prediction, compared with individual models. Ensemble techniques are very useful for improving the generalization ability of classifiers. The random subspace ensemble technique is a simple but effective method for constructing ensemble classifiers; it involves randomly drawing some of the features from each classifier in the ensemble. The instance selection technique involves selecting critical instances while deleting and removing irrelevant and noisy instances from the original dataset. The instance selection and random subspace methods are both well known in the field of data mining and have proven to be very effective in many applications. However, few studies have focused on integrating the instance selection and random subspace methods. Therefore, this study proposed a new hybrid ensemble model that integrates instance selection and random subspace techniques using genetic algorithms (GAs) to improve the performance of a random subspace ensemble model. GAs are used to select optimal (or near optimal) instances, which are used as input data for the random subspace ensemble model. The proposed model was applied to both Kaggle credit data and corporate credit data, and the results were compared with those of other models to investigate performance in terms of classification accuracy, levels of diversity, and average classification rates of base classifiers in the ensemble. The experimental results demonstrated that the proposed model outperformed other models including the single model, the instance selection model, and the original random subspace ensemble model.

SOME SMALL DEVIATION THEOREMS FOR ARBITRARY RANDOM FIELDS WITH RESPECT TO BINOMIAL DISTRIBUTIONS INDEXED BY AN INFINITE TREE ON GENERALIZED RANDOM SELECTION SYSTEMS

  • LI, FANG;WANG, KANGKANG
    • Journal of applied mathematics & informatics
    • /
    • v.33 no.5_6
    • /
    • pp.517-530
    • /
    • 2015
  • In this paper, we establish a class of strong limit theorems, represented by inequalities, for the arbitrary random field with respect to the product binomial distributions indexed by the infinite tree on the generalized random selection system by constructing the consistent distri-bution and a nonnegative martingale with pure analytical methods. As corollaries, some limit properties for the Markov chain field with respect to the binomial distributions indexed by the infinite tree on the generalized random selection system are studied.

Biological Feature Selection and Disease Gene Identification using New Stepwise Random Forests

  • Hwang, Wook-Yeon
    • Industrial Engineering and Management Systems
    • /
    • v.16 no.1
    • /
    • pp.64-79
    • /
    • 2017
  • Identifying disease genes from human genome is a critical task in biomedical research. Important biological features to distinguish the disease genes from the non-disease genes have been mainly selected based on traditional feature selection approaches. However, the traditional feature selection approaches unnecessarily consider many unimportant biological features. As a result, although some of the existing classification techniques have been applied to disease gene identification, the prediction performance was not satisfactory. A small set of the most important biological features can enhance the accuracy of disease gene identification, as well as provide potentially useful knowledge for biologists or clinicians, who can further investigate the selected biological features as well as the potential disease genes. In this paper, we propose a new stepwise random forests (SRF) approach for biological feature selection and disease gene identification. The SRF approach consists of two stages. In the first stage, only important biological features are iteratively selected in a forward selection manner based on one-dimensional random forest regression, where the updated residual vector is considered as the current response vector. We can then determine a small set of important biological features. In the second stage, random forests classification with regard to the selected biological features is applied to identify disease genes. Our extensive experiments show that the proposed SRF approach outperforms the existing feature selection and classification techniques in terms of biological feature selection and disease gene identification.

Bayesian Parameter :Estimation and Variable Selection in Random Effects Generalised Linear Models for Count Data

  • Oh, Man-Suk;Park, Tae-Sung
    • Journal of the Korean Statistical Society
    • /
    • v.31 no.1
    • /
    • pp.93-107
    • /
    • 2002
  • Random effects generalised linear models are useful for analysing clustered count data in which responses are usually correlated. We propose a Bayesian approach to parameter estimation and variable selection in random effects generalised linear models for count data. A simple Gibbs sampling algorithm for parameter estimation is presented and a simple and efficient variable selection is done by using the Gibbs outputs. An illustrative example is provided.

Derivative Evaluation and Conditional Random Selection for Accelerating Genetic Algorithms

  • Jung, Sung-Hoon
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.5 no.1
    • /
    • pp.21-28
    • /
    • 2005
  • This paper proposes a new method for accelerating the search speed of genetic algorithms by taking derivative evaluation and conditional random selection into account in their evolution process. Derivative evaluation makes genetic algorithms focus on the individuals whose fitness is rapidly increased. This accelerates the search speed of genetic algorithms by enhancing exploitation like steepest descent methods but also increases the possibility of a premature convergence that means most individuals after a few generations approach to local optima. On the other hand, derivative evaluation under a premature convergence helps genetic algorithms escape the local optima by enhancing exploration. If GAs fall into a premature convergence, random selection is used in order to help escaping local optimum, but its effects are not large. We experimented our method with one combinatorial problem and five complex function optimization problems. Experimental results showed that our method was superior to the simple genetic algorithm especially when the search space is large.

Slotted ALOHA Based Greedy Relay Selection in Large-scale Wireless Networks

  • Ouyang, Fengchen;Ge, Jianhua;Gong, Fengkui
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.9 no.10
    • /
    • pp.3945-3964
    • /
    • 2015
  • Since the decentralized structure and the blindness of a large-scale wireless network make it difficult to collect the real-time channel state or other information from random distributed relays, a fundamental question is whether it is feasible to perform the relay selection without this knowledge. In this paper, a Slotted ALOHA based Greedy Relay Selection (SAGRS) scheme is presented. The proposed scheme allows the relays satisfying the user's minimum transmission request to compete for selection by randomly accessing the channel through the slotted ALOHA protocol without the need for the information collection procedure. Moreover, a greedy selection mechanism is introduced with which a user can wait for an even better relay when a suitable one is successfully stored. The optimal access probability of a relay is determined through the utilization of the available relay region, a geographical region consisting of all the relays that satisfy the minimum transmission demand of the user. The average number of the selection slots and the failure probability of the scheme are analyzed in this paper. By simulations, the validation and the effectiveness of the SAGRS scheme are confirmed. With a balance between the selection slots and the instantaneous rate of the selected relay, the proposed scheme outperforms other random access selection schemes.

Effects of Call-back Rules and Random Selection of Respondents: Statistical Re-analysis of R&R’s Ulsan Survey Data. (전화조사에서 재통화 규칙준수와 응답자 임의선택의 영향 - R&R 울산 사례의 통계적 재분석 -)

  • 허명회;임여주;노규형
    • The Korean Journal of Applied Statistics
    • /
    • v.16 no.2
    • /
    • pp.247-259
    • /
    • 2003
  • In Korea, quota sampling is mainly adopted in telephone surveys, instead of random sampling which requires call-back procedure and random selection of respondent within households. The contact mode based on the se $x^{*}$age quotas is economically more advantageous and less time-consuming. However, it lacks theoretical ground for valid statistical inference, so that it is hardly accepted in academic circles despite of widely spread practice. Subsequently, survey theoreticians argued that random sampling-based telephone surveys should be tried. In response, Research & Research (R&R), a private research company in Seoul, executed atelephone survey by random sampling mode for the prediction of 2002 Ulsan City Mayor Election. The aim of this case study is to find out various effects of the call-back rule with random selection of respondents by statistically re-analyzing R&R’s Ulsan Survey Data.s by statistically re-analyzing R&R’s Ulsan Survey Data.

Feature Selection Algorithm for Intrusions Detection System using Sequential Forward Search and Random Forest Classifier

  • Lee, Jinlee;Park, Dooho;Lee, Changhoon
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.10
    • /
    • pp.5132-5148
    • /
    • 2017
  • Cyber attacks are evolving commensurate with recent developments in information security technology. Intrusion detection systems collect various types of data from computers and networks to detect security threats and analyze the attack information. The large amount of data examined make the large number of computations and low detection rates problematic. Feature selection is expected to improve the classification performance and provide faster and more cost-effective results. Despite the various feature selection studies conducted for intrusion detection systems, it is difficult to automate feature selection because it is based on the knowledge of security experts. This paper proposes a feature selection technique to overcome the performance problems of intrusion detection systems. Focusing on feature selection, the first phase of the proposed system aims at constructing a feature subset using a sequential forward floating search (SFFS) to downsize the dimension of the variables. The second phase constructs a classification model with the selected feature subset using a random forest classifier (RFC) and evaluates the classification accuracy. Experiments were conducted with the NSL-KDD dataset using SFFS-RF, and the results indicated that feature selection techniques are a necessary preprocessing step to improve the overall system performance in systems that handle large datasets. They also verified that SFFS-RF could be used for data classification. In conclusion, SFFS-RF could be the key to improving the classification model performance in machine learning.

Ensemble Gene Selection Method Based on Multiple Tree Models

  • Mingzhu Lou
    • Journal of Information Processing Systems
    • /
    • v.19 no.5
    • /
    • pp.652-662
    • /
    • 2023
  • Identifying highly discriminating genes is a critical step in tumor recognition tasks based on microarray gene expression profile data and machine learning. Gene selection based on tree models has been the subject of several studies. However, these methods are based on a single-tree model, often not robust to ultra-highdimensional microarray datasets, resulting in the loss of useful information and unsatisfactory classification accuracy. Motivated by the limitations of single-tree-based gene selection, in this study, ensemble gene selection methods based on multiple-tree models were studied to improve the classification performance of tumor identification. Specifically, we selected the three most representative tree models: ID3, random forest, and gradient boosting decision tree. Each tree model selects top-n genes from the microarray dataset based on its intrinsic mechanism. Subsequently, three ensemble gene selection methods were investigated, namely multipletree model intersection, multiple-tree module union, and multiple-tree module cross-union, were investigated. Experimental results on five benchmark public microarray gene expression datasets proved that the multiple tree module union is significantly superior to gene selection based on a single tree model and other competitive gene selection methods in classification accuracy.

Broadband Spectrum Sensing of Distributed Modulated Wideband Converter Based on Markov Random Field

  • Li, Zhi;Zhu, Jiawei;Xu, Ziyong;Hua, Wei
    • ETRI Journal
    • /
    • v.40 no.2
    • /
    • pp.237-245
    • /
    • 2018
  • The Distributed Modulated Wideband Converter (DMWC) is a networking system developed from the Modulated Wideband Converter, which converts all sampling channels into sensing nodes with number variables to implement signal undersampling. When the number of sparse subbands changes, the number of nodes can be adjusted flexibly to improve the reconstruction rate. Owing to the different attenuations of distributed nodes in different locations, it is worthwhile to find out how to select the optimal sensing node as the sampling channel. This paper proposes the spectrum sensing of DMWC based on a Markov random field (MRF) to select the ideal node, which is compared to the image edge segmentation. The attenuation of the candidate nodes is estimated based on the attenuation of the neighboring nodes that have participated in the DMWC system. Theoretical analysis and numerical simulations show that neighboring attenuation plays an important role in determining the node selection, and selecting the node using MRF can avoid serious transmission attenuation. Furthermore, DMWC can greatly improve recovery performance by using a Markov random field compared with random selection.