• Title/Summary/Keyword: Random selection

Search Result 638, Processing Time 0.02 seconds

Semiparametric Kernel Poisson Regression for Longitudinal Count Data

  • Hwang, Chang-Ha;Shim, Joo-Yong
    • Communications for Statistical Applications and Methods
    • /
    • v.15 no.6
    • /
    • pp.1003-1011
    • /
    • 2008
  • Mixed-effect Poisson regression models are widely used for analysis of correlated count data such as those found in longitudinal studies. In this paper, we consider kernel extensions with semiparametric fixed effects and parametric random effects. The estimation is through the penalized likelihood method based on kernel trick and our focus is on the efficient computation and the effective hyperparameter selection. For the selection of hyperparameters, cross-validation techniques are employed. Examples illustrating usage and features of the proposed method are provided.

A Study on the Multivariate Stratified Random Sampling with Multiplicity (중복수가 있는 다변량 층화임의추출에 관한 연구(층별로 독립인 경우의 배분문제))

  • Kim, Ho-Il
    • Journal of the Korean Data and Information Science Society
    • /
    • v.10 no.1
    • /
    • pp.79-89
    • /
    • 1999
  • A counting rule that allows an element to be linked to more than one enumeration unit is called a multiplicity counting rule. Sample designs that use multiplicity counting rules are called network samples. Defining a network to be a set of observation units with a given linkage pattern, a network may be linked with more than one selection unit, and a single selection unit may be linked with more than one network. This paper considers allocation for multivariate stratified random sampling with multiplicity.

  • PDF

A Novel Feature Selection Approach to Classify Breast Cancer Drug using Optimized Grey Wolf Algorithm

  • Shobana, G.;Priya, N.
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.9
    • /
    • pp.258-270
    • /
    • 2022
  • Cancer has become a common disease for the past two decades throughout the globe and there is significant increase of cancer among women. Breast cancer and ovarian cancers are more prevalent among women. Majority of the patients approach the physicians only during their final stage of the disease. Early diagnosis of cancer remains a great challenge for the researchers. Although several drugs are being synthesized very often, their multi-benefits are less investigated. With millions of drugs synthesized and their data are accessible through open repositories. Drug repurposing can be done using machine learning techniques. We propose a feature selection technique in this paper, which is novel that generates multiple populations for the grey wolf algorithm and classifies breast cancer drugs efficiently. Leukemia drug dataset is also investigated and Multilayer perceptron achieved 96% prediction accuracy. Three supervised machine learning algorithms namely Random Forest classifier, Multilayer Perceptron and Support Vector Machine models were applied and Multilayer perceptron had higher accuracy rate of 97.7% for breast cancer drug classification.

Classification for Imbalanced Breast Cancer Dataset Using Resampling Methods

  • Hana Babiker, Nassar
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.1
    • /
    • pp.89-95
    • /
    • 2023
  • Analyzing breast cancer patient files is becoming an exciting area of medical information analysis, especially with the increasing number of patient files. In this paper, breast cancer data is collected from Khartoum state hospital, and the dataset is classified into recurrence and no recurrence. The data is imbalanced, meaning that one of the two classes have more sample than the other. Many pre-processing techniques are applied to classify this imbalanced data, resampling, attribute selection, and handling missing values, and then different classifiers models are built. In the first experiment, five classifiers (ANN, REP TREE, SVM, and J48) are used, and in the second experiment, meta-learning algorithms (Bagging, Boosting, and Random subspace). Finally, the ensemble model is used. The best result was obtained from the ensemble model (Boosting with J48) with the highest accuracy 95.2797% among all the algorithms, followed by Bagging with J48(90.559%) and random subspace with J48(84.2657%). The breast cancer imbalanced dataset was classified into recurrence, and no recurrence with different classified algorithms and the best result was obtained from the ensemble model.

Comparison of Random and Blocked Practice during Performance of the Stop Signal Task

  • Kwon, Jung-Won;Nam, Seok-Hyun;Kim, Chung-Sun
    • The Journal of Korean Physical Therapy
    • /
    • v.23 no.3
    • /
    • pp.65-70
    • /
    • 2011
  • Purpose: We investigated the changes in the stop-signal reaction time (SSRT) and the no-signal reaction time (NSRT) following motor sequential learning in the stop-signal task (SST). This study also determined which of the reduction0s of spatial processing time was better between blocked- and random-SST. Methods: Thirty right-handed healthy subjects without a history of neurological dysfunction were recruited. In all subjects, both the SSRT and the NSRT were measured for the SST. Tasks were classified into two categories based on the stop-signal patterns, the blocked-SST practice group and random-SST practice group. All subjects gave written informed consent. Results: In the blocked-SST group, both the SSRT and the NSRT was significantly decreased (p<0.05) but not significantly changed in the random-SST group. In the SSRT and the NSRT, the blocked-SST group was faster than the random-SST group (p<0.05). In the post-test SST after practice of each group, the SSRT was significantly decreased in the random-SST group (p<0.05), but the NSRT showed no significant changes in either group. Conclusion: These findings demonstrate that random-SST practice resulted in a decrease in internal processing times needed for a rapid stop to visual signals, indicating motor skill learning is acquired through improved response selection and inhibition.

Exact Error Rate of Dual-Channel Receiver with Remote Antenna Unit Selection in Multicell Networks

  • Wang, Qing;Liu, Ju;Zheng, Lina;Xiong, Hailiang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.8
    • /
    • pp.3585-3601
    • /
    • 2016
  • The error rate performance of circularly distributed antenna system is studied over Nakagami-m fading channels, where a dual-channel receiver is employed for the quadrature phase shift keying signals detection. To mitigate the Co-Channel Interference (CCI) caused by the adjacent cells and to save the transmit power, this work presents remote antenna unit selection transmission based on the best channel quality and the maximized path-loss, respectively. The commonly used Gaussian and Q-function approximation method in which the CCI and the noise are assumed to be Gaussian distributed fails to depict the precise system performance according to the central limit theory. To this end, this work treats the CCI as a random variable with random variance. Since the in-phase and the quadrature components of the CCI are correlated over Nakagami-m fading channels, the dependency between the in-phase and the quadrature components is also considered for the error rate analysis. For the special case of Rayleigh fading in which the dependency between the in-phase and the quadrature components can be ignored, the closed-form error rate expressions are derived. Numerical results validate the accuracy of the theoretical analysis, and a comparison among different transmission schemes is also performed.

Data Mining-Aided Automatic Landslide Detection Using Airborne Laser Scanning Data in Densely Forested Tropical Areas

  • Mezaal, Mustafa Ridha;Pradhan, Biswajeet
    • Korean Journal of Remote Sensing
    • /
    • v.34 no.1
    • /
    • pp.45-74
    • /
    • 2018
  • Landslide is a natural hazard that threats lives and properties in many areas around the world. Landslides are difficult to recognize, particularly in rainforest regions. Thus, an accurate, detailed, and updated inventory map is required for landslide susceptibility, hazard, and risk analyses. The inconsistency in the results obtained using different features selection techniques in the literature has highlighted the importance of evaluating these techniques. Thus, in this study, six techniques of features selection were evaluated. Very-high-resolution LiDAR point clouds and orthophotos were acquired simultaneously in a rainforest area of Cameron Highlands, Malaysia by airborne laser scanning (LiDAR). A fuzzy-based segmentation parameter (FbSP optimizer) was used to optimize the segmentation parameters. Training samples were evaluated using a stratified random sampling method and set to 70% training samples. Two machine-learning algorithms, namely, Support Vector Machine (SVM) and Random Forest (RF), were used to evaluate the performance of each features selection algorithm. The overall accuracies of the SVM and RF models revealed that three of the six algorithms exhibited higher ranks in landslide detection. Results indicated that the classification accuracies of the RF classifier were higher than the SVM classifier using either all features or only the optimal features. The proposed techniques performed well in detecting the landslides in a rainforest area of Malaysia, and these techniques can be easily extended to similar regions.

ELCIC: An R package for model selection using the empirical-likelihood based information criterion

  • Chixiang Chen;Biyi Shen;Ming Wang
    • Communications for Statistical Applications and Methods
    • /
    • v.30 no.4
    • /
    • pp.355-368
    • /
    • 2023
  • This article introduces the R package ELCIC (https://cran.r-project.org/web/packages/ELCIC/index.html), which provides an empirical likelihood-based information criterion (ELCIC) for model selection that includes, but is not limited to, variable selection. The empirical likelihood is a semi-parametric approach to draw statistical inference that does not require distribution assumptions for data generation. Therefore, ELCIC is more robust and versatile in the context of model selection compared to the currently existing information criteria. This paper illustrates several applications of ELCIC, including its use in generalized linear models, generalized estimating equations (GEE) for longitudinal data, and weighted GEE (WGEE) for missing longitudinal data under the mechanisms of missing at random and dropout.

Prediction of Paroxysmal Atrial Fibrillation using Time-domain Analysis and Random Forest

  • Lee, Seung-Hwan;Kang, Dong-Won;Lee, Kyoung-Joung
    • Journal of Biomedical Engineering Research
    • /
    • v.39 no.2
    • /
    • pp.69-79
    • /
    • 2018
  • The present study proposes an algorithm that can discriminate between normal subjects and paroxysmal atrial fibrillation (PAF) patients, which is conducted using electrocardiogram (ECG) without PAF events. For this, time-domain features and random forest classifier are used. Time-domain features are obtained from Poincare plot, Lorenz plot of ${\delta}RR$ interval, and morphology analysis. Afterward, three features are selected in total through feature selection. PAF patients and normal subjects are classified using random forest. The classification result showed that sensitivity and specificity were 81.82% and 95.24% respectively, the positive predictive value and negative predictive value were 96.43% and 76.92% respectively, and accuracy was 87.04%. The proposed algorithm had an advantage in terms of the computation requirement compared to existing algorithm, so it has suggested applicability in the more efficient prediction of PAF.

Evaluation of the Block Effects in Response Surface Designs with Random Block Effects over Cuboidal Regions

  • Park, Sang-Hyun
    • Communications for Statistical Applications and Methods
    • /
    • v.7 no.3
    • /
    • pp.741-757
    • /
    • 2000
  • In may experimental situations, whenever a block design is used, the block effect is usually considered to be fixed. There are, however, experimental situations in which it should be treated as random. The choice of a blocking arrangement for a response surface design can have a considerable effect on estimating the mean response and on the size of he prediction variance even if the experimental runs re the same. Therefore, care should be exercised in the selection of blocks. In this paper, in the presence of a random block effect, we propose a graphical method or evaluating the effect of blocking in response surface designs using cuboidal regions. This graphical method can be used to investigate how the blocking has influence on the prediction variance throughout all experimental regions of interest when this region is cuboidal, and compare the block effects in the cases of the orthogonal and non-orthogonal block designs, respectively.

  • PDF