• 제목/요약/키워드: Random selection

검색결과 638건 처리시간 0.019초

사례 선택 기법을 활용한 앙상블 모형의 성능 개선 (Improving an Ensemble Model Using Instance Selection Method)

  • 민성환
    • 산업경영시스템학회지
    • /
    • 제39권1호
    • /
    • pp.105-115
    • /
    • 2016
  • Ensemble classification involves combining individually trained classifiers to yield more accurate prediction, compared with individual models. Ensemble techniques are very useful for improving the generalization ability of classifiers. The random subspace ensemble technique is a simple but effective method for constructing ensemble classifiers; it involves randomly drawing some of the features from each classifier in the ensemble. The instance selection technique involves selecting critical instances while deleting and removing irrelevant and noisy instances from the original dataset. The instance selection and random subspace methods are both well known in the field of data mining and have proven to be very effective in many applications. However, few studies have focused on integrating the instance selection and random subspace methods. Therefore, this study proposed a new hybrid ensemble model that integrates instance selection and random subspace techniques using genetic algorithms (GAs) to improve the performance of a random subspace ensemble model. GAs are used to select optimal (or near optimal) instances, which are used as input data for the random subspace ensemble model. The proposed model was applied to both Kaggle credit data and corporate credit data, and the results were compared with those of other models to investigate performance in terms of classification accuracy, levels of diversity, and average classification rates of base classifiers in the ensemble. The experimental results demonstrated that the proposed model outperformed other models including the single model, the instance selection model, and the original random subspace ensemble model.

SOME SMALL DEVIATION THEOREMS FOR ARBITRARY RANDOM FIELDS WITH RESPECT TO BINOMIAL DISTRIBUTIONS INDEXED BY AN INFINITE TREE ON GENERALIZED RANDOM SELECTION SYSTEMS

  • LI, FANG;WANG, KANGKANG
    • Journal of applied mathematics & informatics
    • /
    • 제33권5_6호
    • /
    • pp.517-530
    • /
    • 2015
  • In this paper, we establish a class of strong limit theorems, represented by inequalities, for the arbitrary random field with respect to the product binomial distributions indexed by the infinite tree on the generalized random selection system by constructing the consistent distri-bution and a nonnegative martingale with pure analytical methods. As corollaries, some limit properties for the Markov chain field with respect to the binomial distributions indexed by the infinite tree on the generalized random selection system are studied.

Biological Feature Selection and Disease Gene Identification using New Stepwise Random Forests

  • Hwang, Wook-Yeon
    • Industrial Engineering and Management Systems
    • /
    • 제16권1호
    • /
    • pp.64-79
    • /
    • 2017
  • Identifying disease genes from human genome is a critical task in biomedical research. Important biological features to distinguish the disease genes from the non-disease genes have been mainly selected based on traditional feature selection approaches. However, the traditional feature selection approaches unnecessarily consider many unimportant biological features. As a result, although some of the existing classification techniques have been applied to disease gene identification, the prediction performance was not satisfactory. A small set of the most important biological features can enhance the accuracy of disease gene identification, as well as provide potentially useful knowledge for biologists or clinicians, who can further investigate the selected biological features as well as the potential disease genes. In this paper, we propose a new stepwise random forests (SRF) approach for biological feature selection and disease gene identification. The SRF approach consists of two stages. In the first stage, only important biological features are iteratively selected in a forward selection manner based on one-dimensional random forest regression, where the updated residual vector is considered as the current response vector. We can then determine a small set of important biological features. In the second stage, random forests classification with regard to the selected biological features is applied to identify disease genes. Our extensive experiments show that the proposed SRF approach outperforms the existing feature selection and classification techniques in terms of biological feature selection and disease gene identification.

Bayesian Parameter :Estimation and Variable Selection in Random Effects Generalised Linear Models for Count Data

  • Oh, Man-Suk;Park, Tae-Sung
    • Journal of the Korean Statistical Society
    • /
    • 제31권1호
    • /
    • pp.93-107
    • /
    • 2002
  • Random effects generalised linear models are useful for analysing clustered count data in which responses are usually correlated. We propose a Bayesian approach to parameter estimation and variable selection in random effects generalised linear models for count data. A simple Gibbs sampling algorithm for parameter estimation is presented and a simple and efficient variable selection is done by using the Gibbs outputs. An illustrative example is provided.

Derivative Evaluation and Conditional Random Selection for Accelerating Genetic Algorithms

  • Jung, Sung-Hoon
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • 제5권1호
    • /
    • pp.21-28
    • /
    • 2005
  • This paper proposes a new method for accelerating the search speed of genetic algorithms by taking derivative evaluation and conditional random selection into account in their evolution process. Derivative evaluation makes genetic algorithms focus on the individuals whose fitness is rapidly increased. This accelerates the search speed of genetic algorithms by enhancing exploitation like steepest descent methods but also increases the possibility of a premature convergence that means most individuals after a few generations approach to local optima. On the other hand, derivative evaluation under a premature convergence helps genetic algorithms escape the local optima by enhancing exploration. If GAs fall into a premature convergence, random selection is used in order to help escaping local optimum, but its effects are not large. We experimented our method with one combinatorial problem and five complex function optimization problems. Experimental results showed that our method was superior to the simple genetic algorithm especially when the search space is large.

Slotted ALOHA Based Greedy Relay Selection in Large-scale Wireless Networks

  • Ouyang, Fengchen;Ge, Jianhua;Gong, Fengkui
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제9권10호
    • /
    • pp.3945-3964
    • /
    • 2015
  • Since the decentralized structure and the blindness of a large-scale wireless network make it difficult to collect the real-time channel state or other information from random distributed relays, a fundamental question is whether it is feasible to perform the relay selection without this knowledge. In this paper, a Slotted ALOHA based Greedy Relay Selection (SAGRS) scheme is presented. The proposed scheme allows the relays satisfying the user's minimum transmission request to compete for selection by randomly accessing the channel through the slotted ALOHA protocol without the need for the information collection procedure. Moreover, a greedy selection mechanism is introduced with which a user can wait for an even better relay when a suitable one is successfully stored. The optimal access probability of a relay is determined through the utilization of the available relay region, a geographical region consisting of all the relays that satisfy the minimum transmission demand of the user. The average number of the selection slots and the failure probability of the scheme are analyzed in this paper. By simulations, the validation and the effectiveness of the SAGRS scheme are confirmed. With a balance between the selection slots and the instantaneous rate of the selected relay, the proposed scheme outperforms other random access selection schemes.

전화조사에서 재통화 규칙준수와 응답자 임의선택의 영향 - R&R 울산 사례의 통계적 재분석 - (Effects of Call-back Rules and Random Selection of Respondents: Statistical Re-analysis of R&R’s Ulsan Survey Data.)

  • 허명회;임여주;노규형
    • 응용통계연구
    • /
    • 제16권2호
    • /
    • pp.247-259
    • /
    • 2003
  • 우리나라 조사업계에서는 전화조사의 방법론으로 성과 나이, 지역에 표본 수를 사전 지정하는 방식의 할당표집 (quota sampling)을 주로 쓰고 있다. 이러한 할당표집은 조사비용과 기간의 단축이라는 이점을 갖지만 이론적 타당성이 결여되어 있어 학문적으로는 받아들이기 어렵다. 때문에, 학계에서는 그 동안 수차례 임의표집(random sampling)에 근거한 전화조사를 조사업계에 요구해 왔다. 이에 응하여, (주)리서치 앤 리서치가 2002년 울산시장 선거예측 조사에 임의표집에 의한 전화조사를 실시하였다 본 사례연구는 이 자료를 심층적으로 재분석하여 임의표집에서의 재통화 및 응답자 임의선정 절차가 자료 질 및 최종 예측치에 주는 영향에 대하여 살펴볼 것이다.

Feature Selection Algorithm for Intrusions Detection System using Sequential Forward Search and Random Forest Classifier

  • Lee, Jinlee;Park, Dooho;Lee, Changhoon
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제11권10호
    • /
    • pp.5132-5148
    • /
    • 2017
  • Cyber attacks are evolving commensurate with recent developments in information security technology. Intrusion detection systems collect various types of data from computers and networks to detect security threats and analyze the attack information. The large amount of data examined make the large number of computations and low detection rates problematic. Feature selection is expected to improve the classification performance and provide faster and more cost-effective results. Despite the various feature selection studies conducted for intrusion detection systems, it is difficult to automate feature selection because it is based on the knowledge of security experts. This paper proposes a feature selection technique to overcome the performance problems of intrusion detection systems. Focusing on feature selection, the first phase of the proposed system aims at constructing a feature subset using a sequential forward floating search (SFFS) to downsize the dimension of the variables. The second phase constructs a classification model with the selected feature subset using a random forest classifier (RFC) and evaluates the classification accuracy. Experiments were conducted with the NSL-KDD dataset using SFFS-RF, and the results indicated that feature selection techniques are a necessary preprocessing step to improve the overall system performance in systems that handle large datasets. They also verified that SFFS-RF could be used for data classification. In conclusion, SFFS-RF could be the key to improving the classification model performance in machine learning.

Ensemble Gene Selection Method Based on Multiple Tree Models

  • Mingzhu Lou
    • Journal of Information Processing Systems
    • /
    • 제19권5호
    • /
    • pp.652-662
    • /
    • 2023
  • Identifying highly discriminating genes is a critical step in tumor recognition tasks based on microarray gene expression profile data and machine learning. Gene selection based on tree models has been the subject of several studies. However, these methods are based on a single-tree model, often not robust to ultra-highdimensional microarray datasets, resulting in the loss of useful information and unsatisfactory classification accuracy. Motivated by the limitations of single-tree-based gene selection, in this study, ensemble gene selection methods based on multiple-tree models were studied to improve the classification performance of tumor identification. Specifically, we selected the three most representative tree models: ID3, random forest, and gradient boosting decision tree. Each tree model selects top-n genes from the microarray dataset based on its intrinsic mechanism. Subsequently, three ensemble gene selection methods were investigated, namely multipletree model intersection, multiple-tree module union, and multiple-tree module cross-union, were investigated. Experimental results on five benchmark public microarray gene expression datasets proved that the multiple tree module union is significantly superior to gene selection based on a single tree model and other competitive gene selection methods in classification accuracy.

Broadband Spectrum Sensing of Distributed Modulated Wideband Converter Based on Markov Random Field

  • Li, Zhi;Zhu, Jiawei;Xu, Ziyong;Hua, Wei
    • ETRI Journal
    • /
    • 제40권2호
    • /
    • pp.237-245
    • /
    • 2018
  • The Distributed Modulated Wideband Converter (DMWC) is a networking system developed from the Modulated Wideband Converter, which converts all sampling channels into sensing nodes with number variables to implement signal undersampling. When the number of sparse subbands changes, the number of nodes can be adjusted flexibly to improve the reconstruction rate. Owing to the different attenuations of distributed nodes in different locations, it is worthwhile to find out how to select the optimal sensing node as the sampling channel. This paper proposes the spectrum sensing of DMWC based on a Markov random field (MRF) to select the ideal node, which is compared to the image edge segmentation. The attenuation of the candidate nodes is estimated based on the attenuation of the neighboring nodes that have participated in the DMWC system. Theoretical analysis and numerical simulations show that neighboring attenuation plays an important role in determining the node selection, and selecting the node using MRF can avoid serious transmission attenuation. Furthermore, DMWC can greatly improve recovery performance by using a Markov random field compared with random selection.