• Title/Summary/Keyword: Random selection

Search Result 641, Processing Time 0.025 seconds

API Feature Based Ensemble Model for Malware Family Classification (악성코드 패밀리 분류를 위한 API 특징 기반 앙상블 모델 학습)

  • Lee, Hyunjong;Euh, Seongyul;Hwang, Doosung
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.29 no.3
    • /
    • pp.531-539
    • /
    • 2019
  • This paper proposes the training features for malware family analysis and analyzes the multi-classification performance of ensemble models. We construct training data by extracting API and DLL information from malware executables and use Random Forest and XGBoost algorithms which are based on decision tree. API, API-DLL, and DLL-CM features for malware detection and family classification are proposed by analyzing frequently used API and DLL information from malware and converting high-dimensional features to low-dimensional features. The proposed feature selection method provides the advantages of data dimension reduction and fast learning. In performance comparison, the malware detection rate is 93.0% for Random Forest, the accuracy of malware family dataset is 92.0% for XGBoost, and the false positive rate of malware family dataset including benign is about 3.5% for Random Forest and XGBoost.

Two variations of cross-distance selection algorithm in hybrid sufficient dimension reduction

  • Jae Keun Yoo
    • Communications for Statistical Applications and Methods
    • /
    • v.30 no.2
    • /
    • pp.179-189
    • /
    • 2023
  • Hybrid sufficient dimension reduction (SDR) methods to a weighted mean of kernel matrices of two different SDR methods by Ye and Weiss (2003) require heavy computation and time consumption due to bootstrapping. To avoid this, Park et al. (2022) recently develop the so-called cross-distance selection (CDS) algorithm. In this paper, two variations of the original CDS algorithm are proposed depending on how well and equally the covk-SAVE is treated in the selection procedure. In one variation, which is called the larger CDS algorithm, the covk-SAVE is equally and fairly utilized with the other two candiates of SIR-SAVE and covk-DR. But, for the final selection, a random selection should be necessary. On the other hand, SIR-SAVE and covk-DR are utilized with completely ruling covk-SAVE out, which is called the smaller CDS algorithm. Numerical studies confirm that the original CDS algorithm is better than or compete quite well to the two proposed variations. A real data example is presented to compare and interpret the decisions by the three CDS algorithms in practice.

Experiment study of structural random loading identification by the inverse pseudo excitation method

  • Guo, Xing-Lin;Li, Dong-Sheng
    • Structural Engineering and Mechanics
    • /
    • v.18 no.6
    • /
    • pp.791-806
    • /
    • 2004
  • The inverse pseudo excitation method is used in the identification of random loadings. For structures subjected to stationary random excitations, the power spectral density matrices of such loadings are identified experimentally. The identification is based on the measured acceleration responses and the structural frequency response functions. Numerical simulation is used in the optimal selection of sensor locations. The proposed method has been successfully applied to the loading identification experiments of three structural models, two uniform steel cantilever beams and a four-story plastic glass frame, subjected to uncorrelated or partially correlated random excitations. The identified loadings agree quite well with actual excitations. It is proved that the proposed method is quite accurate and efficient in addition to its ability to alleviate the ill conditioning of the structural frequency response functions.

A HGLM framework for Meta-Analysis of Clinical Trials with Binary Outcomes

  • Ha, Il-Do
    • Journal of the Korean Data and Information Science Society
    • /
    • v.19 no.4
    • /
    • pp.1429-1440
    • /
    • 2008
  • In a meta-analysis combining the results from different clinical trials, it is important to consider the possible heterogeneity in outcomes between trials. Such variations can be regarded as random effects. Thus, random-effect models such as HGLMs (hierarchical generalized linear models) are very useful. In this paper, we propose a HGLM framework for analyzing the binominal response data which may have variations in the odds-ratios between clinical trials. We also present the prediction intervals for random effects which are in practice useful to investigate the heterogeneity of the trial effects. The proposed method is illustrated with a real-data set on 22 trials about respiratory tract infections. We further demonstrate that an appropriate HGLM can be confirmed via model-selection criteria.

  • PDF

Classification of cardiotocograms using random forest classifier and selection of important features from cardiotocogram signal

  • Arif, Muhammad
    • Biomaterials and Biomechanics in Bioengineering
    • /
    • v.2 no.3
    • /
    • pp.173-183
    • /
    • 2015
  • In obstetrics, cardiotocography is a procedure to record the fetal heartbeat and the uterine contractions usually during the last trimester of pregnancy. It helps to monitor patterns associated with the fetal activity and to detect the pathologies. In this paper, random forest classifier is used to classify normal, suspicious and pathological patterns based on the features extracted from the cardiotocograms. The results showed that random forest classifier can detect these classes successfully with overall classification accuracy of 93.6%. Moreover, important features are identified to reduce the feature space. It is found that using seven important features, similar classification accuracy can be achieved by random forest classifier (93.3%).

SELECTION FOR PROLIFICACY IN ROMNEY SHEEP I. DIRECT RESPONSE TO SELECTION

  • Bhuiyan, A.K.F.H.;Curran, M.K.
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.8 no.1
    • /
    • pp.23-27
    • /
    • 1995
  • A selection experiment with Romney Marsh sheep was used to evaluate direct responses to selection. Two flocks were maintained; a) the selection line formed in 1979 by the Romney Group Breeders to select for high prolificacy, defined as the number of live lambs born per ewe joined per year and b) a control line, established in 1982, where flock replacements were chosen at random. Predicted responses per year of birth female group and per year respectively were 0.033 and 0.027 live lambs. The rate of predicted response per year was within the theoretical expected range from 0.01 to 0.03 of the mean. The rates of realized response in prolificacy per year of birth female group and per year respectively were 0.026 and 0.021. These estimates of realized responses represented between 0.01 and 0.02 of the control line mean per year.

A Novel Heuristic Mechanism for Highly Utilizable Survivability on WDM Mesh Networks

  • Jeong Hong-Kyu;Kim Byung-Jae;Kang Min-Ho;Lee Yong-Gi
    • 한국정보통신설비학회:학술대회논문집
    • /
    • 2003.08a
    • /
    • pp.159-162
    • /
    • 2003
  • This paper presents a novel heuristic mechanism, Dynamic-network Adapted Cost selection (DAC-selection), which has higher backup path sharing rate, lower number of blocked channel requests and number of used wavelengths fer reservation of working path and backup path by using unique cost function than that of widely used random selection (R-selection) mechanism and Combined Min-cost selection (CMC-selection) mechanism proposed by Lo, while maintaining 100% restoration capability.

  • PDF

Machine Learning Based Neighbor Path Selection Model in a Communication Network

  • Lee, Yong-Jin
    • International journal of advanced smart convergence
    • /
    • v.10 no.1
    • /
    • pp.56-61
    • /
    • 2021
  • Neighbor path selection is to pre-select alternate routes in case geographically correlated failures occur simultaneously on the communication network. Conventional heuristic-based algorithms no longer improve solutions because they cannot sufficiently utilize historical failure information. We present a novel solution model for neighbor path selection by using machine learning technique. Our proposed machine learning neighbor path selection (ML-NPS) model is composed of five modules- random graph generation, data set creation, machine learning modeling, neighbor path prediction, and path information acquisition. It is implemented by Python with Keras on Tensorflow and executed on the tiny computer, Raspberry PI 4B. Performance evaluations via numerical simulation show that the neighbor path communication success probability of our model is better than that of the conventional heuristic by 26% on the average.

Crop Yield Estimation Utilizing Feature Selection Based on Graph Classification (그래프 분류 기반 특징 선택을 활용한 작물 수확량 예측)

  • Ohnmar Khin;Sung-Keun Lee
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.18 no.6
    • /
    • pp.1269-1276
    • /
    • 2023
  • Crop estimation is essential for the multinational meal and powerful demand due to its numerous aspects like soil, rain, climate, atmosphere, and their relations. The consequence of climate shift impacts the farming yield products. We operate the dataset with temperature, rainfall, humidity, etc. The current research focuses on feature selection with multifarious classifiers to assist farmers and agriculturalists. The crop yield estimation utilizing the feature selection approach is 96% accuracy. Feature selection affects a machine learning model's performance. Additionally, the performance of the current graph classifier accepts 81.5%. Eventually, the random forest regressor without feature selections owns 78% accuracy and the decision tree regressor without feature selections retains 67% accuracy. Our research merit is to reveal the experimental results of with and without feature selection significance for the proposed ten algorithms. These findings support learners and students in choosing the appropriate models for crop classification studies.

An Analytical Study on Automatic Classification of Domestic Journal articles Using Random Forest (랜덤포레스트를 이용한 국내 학술지 논문의 자동분류에 관한 연구)

  • Kim, Pan Jun
    • Journal of the Korean Society for information Management
    • /
    • v.36 no.2
    • /
    • pp.57-77
    • /
    • 2019
  • Random Forest (RF), a representative ensemble technique, was applied to automatic classification of journal articles in the field of library and information science. Especially, I performed various experiments on the main factors such as tree number, feature selection, and learning set size in terms of classification performance that automatically assigns class labels to domestic journals. Through this, I explored ways to optimize the performance of random forests (RF) for imbalanced datasets in real environments. Consequently, for the automatic classification of domestic journal articles, Random Forest (RF) can be expected to have the best classification performance when using tree number interval 100~1000(C), small feature set (10%) based on chi-square statistic (CHI), and most learning sets (9-10 years).