• Title/Summary/Keyword: Sampling set selection

Search Result 38, Processing Time 0.026 seconds

The Effects of Selection Attributes for HMR on Satisfaction and Repurchase Intention: Comparative Analysis of Convenience Store and Large Market (HMR 선택속성이 만족과 재구매의도에 미치는 영향: 편의점과 대형마트의 비교 분석)

  • Yang, Dong-Hwi
    • Culinary science and hospitality research
    • /
    • v.24 no.3
    • /
    • pp.204-214
    • /
    • 2018
  • The study set up research models and hypotheses to examine the influence of HMR selection attributes on satisfaction and repurchase intention by distribution channels(convenience store/large market), verify the research hypothesis through empirical analysis, respectively. The purpose of this study is to investigate the convenience sampling method of HMR purchase from convenience store and large market in Seoul and Gyeonggi area. The survey was conducted from January 8, 2018 to January 26, 2018, and 300 questionnaires were distributed and 289 of them were used as an effective data. For the empirical analysis, SPSS 20.0 was used. The results of the analysis are as follows. First, product quality only has a significant effect on satisfaction among HMR selection attributes at convenience store, and product safety and convenience have no significant effect on satisfaction. Second, only the convenience of HMR selection attributes in the large market has a significant effect on satisfaction, and product safety and product quality have no significant effect on satisfaction. Third, HMR satisfaction in convenience stores and large markets has a significant effect on repurchase intention. The purpose of this study is to investigate the relationships among HMR selection attributes, satisfaction, and repurchase intention, which are important in the existing HMR research, by each distribution channel(convenience store/large market). It is meaningful to help them establish an effective sales strategy for each segment.

Selection of An Initial Training Set for Active Learning Using Cluster-Based Sampling (능동적 학습을 위한 군집기반 초기훈련집합 선정)

  • 강재호;류광렬;권혁철
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.7
    • /
    • pp.859-868
    • /
    • 2004
  • We propose a method of selecting initial training examples for active learning so that it can reach high accuracy faster with fewer further queries. Our method is based on the assumption that an active learner can reach higher performance when given an initial training set consisting of diverse and typical examples rather than similar and special ones. To obtain a good initial training set, we first cluster examples by using k-means clustering algorithm to find groups of similar examples. Then, a representative example, which is the closest example to the cluster's centroid, is selected from each cluster. After these representative examples are labeled by querying to the user for their categories, they can be used as initial training examples. We also suggest a method of using the centroids as initial training examples by labeling them with categories of corresponding representative examples. Experiments with various text data sets have shown that the active learner starting from the initial training set selected by our method reaches higher accuracy faster than that starting from randomly generated initial training set.

Bayesian Approach for Software Reliability Models (소프트웨어 신뢰모형에 대한 베이지안 접근)

  • Choi, Ki-Heon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.10 no.1
    • /
    • pp.119-133
    • /
    • 1999
  • A Markov Chain Monte Carlo method is developed to compute the software reliability model. We consider computation problem for determining of posterior distibution in Bayseian inference. Metropolis algorithms along with Gibbs sampling are proposed to preform the Bayesian inference of the Mixed model with record value statistics. For model determiniation, we explored the prequential conditional predictive ordinate criterion that selects the best model with the largest posterior likelihood among models using all possible subsets of the component intensity functions. To relax the monotonic intensity function assumptions. A numerical example with simulated data set is given.

  • PDF

A Hybrid Multi-Level Feature Selection Framework for prediction of Chronic Disease

  • G.S. Raghavendra;Shanthi Mahesh;M.V.P. Chandrasekhara Rao
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.12
    • /
    • pp.101-106
    • /
    • 2023
  • Chronic illnesses are among the most common serious problems affecting human health. Early diagnosis of chronic diseases can assist to avoid or mitigate their consequences, potentially decreasing mortality rates. Using machine learning algorithms to identify risk factors is an exciting strategy. The issue with existing feature selection approaches is that each method provides a distinct set of properties that affect model correctness, and present methods cannot perform well on huge multidimensional datasets. We would like to introduce a novel model that contains a feature selection approach that selects optimal characteristics from big multidimensional data sets to provide reliable predictions of chronic illnesses without sacrificing data uniqueness.[1] To ensure the success of our proposed model, we employed balanced classes by employing hybrid balanced class sampling methods on the original dataset, as well as methods for data pre-processing and data transformation, to provide credible data for the training model. We ran and assessed our model on datasets with binary and multivalued classifications. We have used multiple datasets (Parkinson, arrythmia, breast cancer, kidney, diabetes). Suitable features are selected by using the Hybrid feature model consists of Lassocv, decision tree, random forest, gradient boosting,Adaboost, stochastic gradient descent and done voting of attributes which are common output from these methods.Accuracy of original dataset before applying framework is recorded and evaluated against reduced data set of attributes accuracy. The results are shown separately to provide comparisons. Based on the result analysis, we can conclude that our proposed model produced the highest accuracy on multi valued class datasets than on binary class attributes.[1]

Patient Satisfaction as an Indicator of Service Quality in Malaysian Public Hospitals

  • Manaf, Noor Hazilah Abd;Nooi, Phang Siew
    • International Journal of Quality Innovation
    • /
    • v.10 no.1
    • /
    • pp.77-87
    • /
    • 2009
  • The main aim of the paper is to provide an empirical analysis on patient satisfaction as an indicator of service quality in Malaysian public hospitals. Self-administered questionnaires were administered to patients by convenience sampling. Two sets of questionnaires were used, one for inpatient and another one set for outpatient. Selection of hospitals was made according to states in Peninsular Malaysia. 23 hospitals covering all state level hospitals, the National Referral Centre and selected district hospitals were chosen as respondent hospitals. Two dimensions of service quality emerged, namely clinical and physical dimension of service. Both outpatient and inpatient were found to be more satisfied with clinical dimension of service than physical dimension. For outpatient satisfaction, there was positive correlation between waiting time and patient satisfaction. Patient satisfaction was also found to be higher in the smaller district hospitals than in the larger state hospitals. For clinical dimension of service, patients were satisfied with the services of doctors and nurses, while for physical dimension of service, patients were satisfied with the cleanliness of the facilities. The ability of the research to be conducted by random sampling was inhibited by the reluctance of patients to cooperate, which led to the use of convenience sampling. Studies have also shown that patients are reluctant to express their feelings on services provided by their caregivers. The study provides primary data for a nationwide study on patient satisfaction in Malaysian public hospitals, for both inpatient and outpatient.

Data Mining-Aided Automatic Landslide Detection Using Airborne Laser Scanning Data in Densely Forested Tropical Areas

  • Mezaal, Mustafa Ridha;Pradhan, Biswajeet
    • Korean Journal of Remote Sensing
    • /
    • v.34 no.1
    • /
    • pp.45-74
    • /
    • 2018
  • Landslide is a natural hazard that threats lives and properties in many areas around the world. Landslides are difficult to recognize, particularly in rainforest regions. Thus, an accurate, detailed, and updated inventory map is required for landslide susceptibility, hazard, and risk analyses. The inconsistency in the results obtained using different features selection techniques in the literature has highlighted the importance of evaluating these techniques. Thus, in this study, six techniques of features selection were evaluated. Very-high-resolution LiDAR point clouds and orthophotos were acquired simultaneously in a rainforest area of Cameron Highlands, Malaysia by airborne laser scanning (LiDAR). A fuzzy-based segmentation parameter (FbSP optimizer) was used to optimize the segmentation parameters. Training samples were evaluated using a stratified random sampling method and set to 70% training samples. Two machine-learning algorithms, namely, Support Vector Machine (SVM) and Random Forest (RF), were used to evaluate the performance of each features selection algorithm. The overall accuracies of the SVM and RF models revealed that three of the six algorithms exhibited higher ranks in landslide detection. Results indicated that the classification accuracies of the RF classifier were higher than the SVM classifier using either all features or only the optimal features. The proposed techniques performed well in detecting the landslides in a rainforest area of Malaysia, and these techniques can be easily extended to similar regions.

Improved Network Intrusion Detection Model through Hybrid Feature Selection and Data Balancing (Hybrid Feature Selection과 Data Balancing을 통한 효율적인 네트워크 침입 탐지 모델)

  • Min, Byeongjun;Ryu, Jihun;Shin, Dongkyoo;Shin, Dongil
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.10 no.2
    • /
    • pp.65-72
    • /
    • 2021
  • Recently, attacks on the network environment have been rapidly escalating and intelligent. Thus, the signature-based network intrusion detection system is becoming clear about its limitations. To solve these problems, research on machine learning-based intrusion detection systems is being conducted in many ways, but two problems are encountered to use machine learning for intrusion detection. The first is to find important features associated with learning for real-time detection, and the second is the imbalance of data used in learning. This problem is fatal because the performance of machine learning algorithms is data-dependent. In this paper, we propose the HSF-DNN, a network intrusion detection model based on a deep neural network to solve the problems presented above. The proposed HFS-DNN was learned through the NSL-KDD data set and performs performance comparisons with existing classification models. Experiments have confirmed that the proposed Hybrid Feature Selection algorithm does not degrade performance, and in an experiment between learning models that solved the imbalance problem, the model proposed in this paper showed the best performance.

Development of kNN QSAR Models for 3-Arylisoquinoline Antitumor Agents

  • Tropsha, Alexander;Golbraikh, Alexander;Cho, Won-Jea
    • Bulletin of the Korean Chemical Society
    • /
    • v.32 no.7
    • /
    • pp.2397-2404
    • /
    • 2011
  • Variable selection k nearest neighbor QSAR modeling approach was applied to a data set of 80 3-arylisoquinolines exhibiting cytotoxicity against human lung tumor cell line (A-549). All compounds were characterized with molecular topology descriptors calculated with the MolconnZ program. Seven compounds were randomly selected from the original dataset and used as an external validation set. The remaining subset of 73 compounds was divided into multiple training (56 to 61 compounds) and test (17 to 12 compounds) sets using a chemical diversity sampling method developed in this group. Highly predictive models characterized by the leave-one out cross-validated $R^2$ ($q^2$) values greater than 0.8 for the training sets and $R^2$ values greater than 0.7 for the test sets have been obtained. The robustness of models was confirmed by the Y-randomization test: all models built using training sets with randomly shuffled activities were characterized by low $q^2{\leq}0.26$ and $R^2{\leq}0.22$ for training and test sets, respectively. Twelve best models (with the highest values of both $q^2$ and $R^2$) predicted the activities of the external validation set of seven compounds with $R^2$ ranging from 0.71 to 0.93.

Identifying differentially expressed genes using the Polya urn scheme

  • Saraiva, Erlandson Ferreira;Suzuki, Adriano Kamimura;Milan, Luis Aparecido
    • Communications for Statistical Applications and Methods
    • /
    • v.24 no.6
    • /
    • pp.627-640
    • /
    • 2017
  • A common interest in gene expression data analysis is to identify genes that present significant changes in expression levels among biological experimental conditions. In this paper, we develop a Bayesian approach to make a gene-by-gene comparison in the case with a control and more than one treatment experimental condition. The proposed approach is within a Bayesian framework with a Dirichlet process prior. The comparison procedure is based on a model selection procedure developed using the discreteness of the Dirichlet process and its representation via Polya urn scheme. The posterior probabilities for models considered are calculated using a Gibbs sampling algorithm. A numerical simulation study is conducted to understand and compare the performance of the proposed method in relation to usual methods based on analysis of variance (ANOVA) followed by a Tukey test. The comparison among methods is made in terms of a true positive rate and false discovery rate. We find that proposed method outperforms the other methods based on ANOVA followed by a Tukey test. We also apply the methodologies to a publicly available data set on Plasmodium falciparum protein.

Selection of appropriate biomatrices for studies of chronic stress in animals: a review

  • Mohammad, Ataallahi;Jalil Ghassemi, Nejad;Kyu-Hyun, Park
    • Journal of Animal Science and Technology
    • /
    • v.64 no.4
    • /
    • pp.621-639
    • /
    • 2022
  • Cortisol and corticosterone, hormones traditionally considered biomarkers of stress, can be measured in fluid biomatrices (e.g., blood, saliva) from live animals to evaluate conditions at sampling time, or in solid biomatrices (e.g., hair, feather) from live or dead animals to obtain information regarding long-term changes. Using these biomarkers to evaluate physiological stress responses in domestic animals may be challenging due to the diverse characteristics of biomatrices for potential measurement. Ideally, a single measurement from the biomatrix should be sufficient for evaluating chronic stress. The availability of appropriate and cost-effective immunoassay methods for detecting the biomarkers should also be considered. This review discusses the strengths and limitations of different biomatrices with regard to ensuring the highest possible reliability for chronic stress evaluation. Overall, solid biomatrices require less frequent sampling than other biomatrices, resulting in greater time- and cost-effectiveness, greater ease of use, and fewer errors. The multiplex immunoassay can be used to analyze interactions and correlations between cortisol and other stress biomarkers in the same biomatrix. In light of the lack of information regarding appropriate biomatrices for measuring chronic stress, this review may help investigators set experimental conditions or design biological research.