• Title/Summary/Keyword: Selection Methods

Search Result 4,080, Processing Time 0.033 seconds

Investigating Opinion Mining Performance by Combining Feature Selection Methods with Word Embedding and BOW (Bag-of-Words) (속성선택방법과 워드임베딩 및 BOW (Bag-of-Words)를 결합한 오피니언 마이닝 성과에 관한 연구)

  • Eo, Kyun Sun;Lee, Kun Chang
    • Journal of Digital Convergence
    • /
    • v.17 no.2
    • /
    • pp.163-170
    • /
    • 2019
  • Over the past decade, the development of the Web explosively increased the data. Feature selection step is an important step in extracting valuable data from a large amount of data. This study proposes a novel opinion mining model based on combining feature selection (FS) methods with Word embedding to vector (Word2vec) and BOW (Bag-of-words). FS methods adopted for this study are CFS (Correlation based FS) and IG (Information Gain). To select an optimal FS method, a number of classifiers ranging from LR (logistic regression), NN (neural network), NBN (naive Bayesian network) to RF (random forest), RS (random subspace), ST (stacking). Empirical results with electronics and kitchen datasets showed that LR and ST classifiers combined with IG applied to BOW features yield best performance in opinion mining. Results with laptop and restaurant datasets revealed that the RF classifier using IG applied to Word2vec features represents best performance in opinion mining.

Variable Selection in Frailty Models using FrailtyHL R Package: Breast Cancer Survival Data (frailtyHL 통계패키지를 이용한 프레일티 모형의 변수선택: 유방암 생존자료)

  • Kim, Bohyeon;Ha, Il Do;Noh, Maengseok;Na, Myung Hwan;Song, Ho-Chun;Kim, Jahae
    • The Korean Journal of Applied Statistics
    • /
    • v.28 no.5
    • /
    • pp.965-976
    • /
    • 2015
  • Determining relevant variables for a regression model is important in regression analysis. Recently, a variable selection methods using a penalized likelihood with various penalty functions (e.g. LASSO and SCAD) have been widely studied in simple statistical models such as linear models and generalized linear models. The advantage of these methods is that they select important variables and estimate regression coefficients, simultaneously; therefore, they delete insignificant variables by estimating their coefficients as zero. We study how to select proper variables based on penalized hierarchical likelihood (HL) in semi-parametric frailty models that allow three penalty functions, LASSO, SCAD and HL. For the variable selection we develop a new function in the "frailtyHL" R package. Our methods are illustrated with breast cancer survival data from the Medical Center at Chonnam National University in Korea. We compare the results from three variable-selection methods and discuss advantages and disadvantages.

Development of a feature selection technique on users' false beliefs (사용자의 False belief를 이용한 새로운 기능 선택방식에 대한 연구)

  • Lee, Jangsun;Choi, Gyunghyun;Kim, Jieun;Ryu, Hokyoung
    • Journal of the HCI Society of Korea
    • /
    • v.9 no.2
    • /
    • pp.33-40
    • /
    • 2014
  • Selecting appropriate features that products or services should provide for users has been a critical decision making problem for designers. However, the existing feature selection methods have prominent limitations when figuring out how they perceive the features. For example, selecting features based on the users' preference without analyzing users' mental models might lead to the 'feature creep' phenomenon. In this study, we suggest the 'False belief technique' that is able to detect users' mental model for the products/services that are formed after being provided with new features. This technique will be utilized as a way forward to help the designer to determine what features should be included in the new product development.

Feature Selection and Performance Analysis using Quantum-inspired Genetic Algorithm (양자 유전알고리즘을 이용한 특징 선택 및 성능 분석)

  • Heo, G.S.;Jeong, H.T.;Park, A.;Baek, S.J.
    • Smart Media Journal
    • /
    • v.1 no.1
    • /
    • pp.36-41
    • /
    • 2012
  • Feature selection is the important technique of selecting a subset of relevant features for building robust pattern recognition systems. Various methods have been studied for feature selection from sequential search algorithms to stochastic algorithms. In this work, we adopted a Quantum-inspired Genetic Algorithm (QGA) which is based on the concept and principles of quantum computing such as Q-bits and superposition of state for feature selection. The performance of QGA is compared to that of the Conventional Genetic Algorithm (CGA) with respect to the classification rates and the number of selected features. The experimental result using UCI data sets shows that QGA is superior to CGA.

  • PDF

Validation of selection accuracy for the total number of piglets born in Landrace pigs using genomic selection

  • Oh, Jae-Don;Na, Chong-Sam;Park, Kyung-Do
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.30 no.2
    • /
    • pp.149-153
    • /
    • 2017
  • Objective: This study was to determine the relationship between estimated breeding value and phenotype information after farrowing when juvenile selection was made in candidate pigs without phenotype information. Methods: After collecting phenotypic and genomic information for the total number of piglets born by Landrace pigs, selection accuracy between genomic breeding value estimates using genomic information and breeding value estimates of best linear unbiased prediction (BLUP) using conventional pedigree information were compared. Results: Genetic standard deviation (${\sigma}_a$) for the total number of piglets born was 0.91. Since the total number of piglets born for candidate pigs was unknown, the accuracy of the breeding value estimated from pedigree information was 0.080. When genomic information was used, the accuracy of the breeding value was 0.216. Assuming that the replacement rate of sows per year is 100% and generation interval is 1 year, genetic gain per year is 0.346 head when genomic information is used. It is 0.128 when BLUP is used. Conclusion: Genetic gain estimated from single step best linear unbiased prediction (ssBLUP) method is by 2.7 times higher than that the one estimated from BLUP method, i.e., 270% more improvement in efficiency.

Examination Questions Selection Algorithm in Web-based Engineer Test Education System (웹 기반 기사시험 학습 시스템에서의 문제 출제 알고리즘)

  • Kim Eun-Jung
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.9 no.3
    • /
    • pp.11-18
    • /
    • 2004
  • It is making researches in questions selection method for examination in web-based education system. Most questions made for these remote examinations use methods of making questions using fixed questions or randomly using item pools or automatically using degree of difficulty. This paper proposes a new examination questions selection algorithm in web-based education system for engineer test. Generally, Engineer test is characterized by adequate examination questions selection for degree of difficulty and equally between all units. Therefore this algorithm selected examination questions equally well as regards degree of difficulty and distribution between all units. This algorithm providers more effective education examination method as compared with previous algorithm.

  • PDF

Survey on Nucleotide Encoding Techniques and SVM Kernel Design for Human Splice Site Prediction

  • Bari, A.T.M. Golam;Reaz, Mst. Rokeya;Choi, Ho-Jin;Jeong, Byeong-Soo
    • Interdisciplinary Bio Central
    • /
    • v.4 no.4
    • /
    • pp.14.1-14.6
    • /
    • 2012
  • Splice site prediction in DNA sequence is a basic search problem for finding exon/intron and intron/exon boundaries. Removing introns and then joining the exons together forms the mRNA sequence. These sequences are the input of the translation process. It is a necessary step in the central dogma of molecular biology. The main task of splice site prediction is to find out the exact GT and AG ended sequences. Then it identifies the true and false GT and AG ended sequences among those candidate sequences. In this paper, we survey research works on splice site prediction based on support vector machine (SVM). The basic difference between these research works is nucleotide encoding technique and SVM kernel selection. Some methods encode the DNA sequence in a sparse way whereas others encode in a probabilistic manner. The encoded sequences serve as input of SVM. The task of SVM is to classify them using its learning model. The accuracy of classification largely depends on the proper kernel selection for sequence data as well as a selection of kernel parameter. We observe each encoding technique and classify them according to their similarity. Then we discuss about kernel and their parameter selection. Our survey paper provides a basic understanding of encoding approaches and proper kernel selection of SVM for splice site prediction.

Set Covering-based Feature Selection of Large-scale Omics Data (Set Covering 기반의 대용량 오믹스데이터 특징변수 추출기법)

  • Ma, Zhengyu;Yan, Kedong;Kim, Kwangsoo;Ryoo, Hong Seo
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.39 no.4
    • /
    • pp.75-84
    • /
    • 2014
  • In this paper, we dealt with feature selection problem of large-scale and high-dimensional biological data such as omics data. For this problem, most of the previous approaches used simple score function to reduce the number of original variables and selected features from the small number of remained variables. In the case of methods that do not rely on filtering techniques, they do not consider the interactions between the variables, or generate approximate solutions to the simplified problem. Unlike them, by combining set covering and clustering techniques, we developed a new method that could deal with total number of variables and consider the combinatorial effects of variables for selecting good features. To demonstrate the efficacy and effectiveness of the method, we downloaded gene expression datasets from TCGA (The Cancer Genome Atlas) and compared our method with other algorithms including WEKA embeded feature selection algorithms. In the experimental results, we showed that our method could select high quality features for constructing more accurate classifiers than other feature selection algorithms.

A Study on Developing a CER Using Production Cost Data in Korean Maneuver Weapon System (한국형 기동무기체계 양산비 비용추정관계식 개발에 관한 연구)

  • Lee, Doo-Hyun;Kim, Gak-Gyu
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.39 no.3
    • /
    • pp.51-61
    • /
    • 2014
  • In this paper, we deal with developing a cost estimation relationships (CER) for Korean maneuverable weapons systems using historical production cost. To develop the CER, we collected the historical data of the production cost of four tanks and five armored vehicles. We also analyzed the Required Operational Capability (ROC) of the weapons systems and chose cost drivers that can compare operational capabilities of the weapons systems We used Forward selection, Backward selection, Stepwise Regression and $R^2$ selection as the cost drivers which have the greatest influence with the dependent variables. And we used Principle Component Regression, Robust Regression and Weighted Regression to deal with multicollinearity and outlier among the data to develop a more appropriate CER. As a result, we were able to develop a production cost CER for Korean maneuverable weapons systems that have the lowest cost errors. Thus, this research is meaningful in terms of developing a CER based on Korean original cost data without foreign data and these methods will contribute to developing a Korean cost analysis program in the future.

Development of Awarding System for Construction Contractors in Gaza Strip Using Artificial Neural Network (ANN)

  • El-Sawalhi, Nabil;Hajar, Yousef Abu
    • Journal of Construction Engineering and Project Management
    • /
    • v.6 no.3
    • /
    • pp.1-7
    • /
    • 2016
  • The purpose of this paper is to develop a model for selecting the best contractor in the Gaza Strip using the Artificial Neural Network (ANN). The contractor's selection methods and criteria were identified using a field survey. Fifty four engineers were asked to fill a questionnaire that covers factors related to the selection criteria of contractors practiced in Gaza Strip. The results shows that the dominant part of respondents (91%) confirmed that the current awarding method "the lowest bid price" is considered one of the major problems of the construction sector, "award the bid to the highest weight after combination of the technical and financial scores" represented 50% of the respondents. The criteria weights were determined based on Relative Importance Index (RII. Ninety-one tenders(13 projects) were used to train and test the ANN model after re-evaluating the contractors depend on the weights of factors to select the best contractor who achieves the highest score. Neurosolution software was used to train the models. The results of the trained models indicated that neural network reasonably succeeded in selection the best contractor with 95.96% accuracy. The performed sensitivity analysis showed that the profitability and capital of company are the most influential parameters in selection contractors. This model gives chance to the owner to be more accurate in selecting the most appropriate contractor.