• Title/Summary/Keyword: Accuracy of Selection

Search Result 1,156, Processing Time 0.026 seconds

A study on neighbor selection methods in k-NN collaborative filtering recommender system (근접 이웃 선정 협력적 필터링 추천시스템에서 이웃 선정 방법에 관한 연구)

  • Lee, Seok-Jun
    • Journal of the Korean Data and Information Science Society
    • /
    • v.20 no.5
    • /
    • pp.809-818
    • /
    • 2009
  • Collaborative filtering approach predicts the preference of active user about specific items transacted on the e-commerce by using others' preference information. To improve the prediction accuracy through collaborative filtering approach, it must be needed to gain enough preference information of users' for predicting preference. But, a bit much information of users' preference might wrongly affect on prediction accuracy, and also too small information of users' preference might make bad effect on the prediction accuracy. This research suggests the method, which decides suitable numbers of neighbor users for applying collaborative filtering algorithm, improved by existing k nearest neighbors selection methods. The result of this research provides useful methods for improving the prediction accuracy and also refines exploratory data analysis approach for deciding appropriate numbers of nearest neighbors.

  • PDF

Analyzing Factors Contributing to Research Performance using Backpropagation Neural Network and Support Vector Machine

  • Ermatita, Ermatita;Sanmorino, Ahmad;Samsuryadi, Samsuryadi;Rini, Dian Palupi
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.1
    • /
    • pp.153-172
    • /
    • 2022
  • In this study, the authors intend to analyze factors contributing to research performance using Backpropagation Neural Network and Support Vector Machine. The analyzing factors contributing to lecturer research performance start from defining the features. The next stage is to collect datasets based on defining features. Then transform the raw dataset into data ready to be processed. After the data is transformed, the next stage is the selection of features. Before the selection of features, the target feature is determined, namely research performance. The selection of features consists of Chi-Square selection (U), and Pearson correlation coefficient (CM). The selection of features produces eight factors contributing to lecturer research performance are Scientific Papers (U: 154.38, CM: 0.79), Number of Citation (U: 95.86, CM: 0.70), Conference (U: 68.67, CM: 0.57), Grade (U: 10.13, CM: 0.29), Grant (U: 35.40, CM: 0.36), IPR (U: 19.81, CM: 0.27), Qualification (U: 2.57, CM: 0.26), and Grant Awardee (U: 2.66, CM: 0.26). To analyze the factors, two data mining classifiers were involved, Backpropagation Neural Networks (BPNN) and Support Vector Machine (SVM). Evaluation of the data mining classifier with an accuracy score for BPNN of 95 percent, and SVM of 92 percent. The essence of this analysis is not to find the highest accuracy score, but rather whether the factors can pass the test phase with the expected results. The findings of this study reveal the factors that have a significant impact on research performance and vice versa.

Accuracy Analysis of DEMs Generated from High Resolution Optical and SAR Images (고해상도 광학영상과 SAR영상으로부터 생성된 수치표고모델의 정확도 분석)

  • Kim, Chung;Lee, Dong-Cheon;Yom, Jae-Hong;Lee, Young-Wook
    • Proceedings of the Korean Society of Surveying, Geodesy, Photogrammetry, and Cartography Conference
    • /
    • 2004.04a
    • /
    • pp.337-343
    • /
    • 2004
  • Spatial information could be obtained from spaceborne high resolution optical and synthetic aperture radar(SAR) images. However, some satellite images do not provide physical sensor information instead, rational polynomial coefficients(RPC) are available. The objectives of this study are: (1) 3-dimensional ground coordinates were computed by applying rational function model(RFM) with the RPC for the stereo pair of Ikonos images and their accuracy was evaluated. (2) Interferometric SAR(InSAR) was applied to JERS-1 images to generate DEM and its accuracy was analysis. (3) Quality of the DEM generated automatically also analyzed for different types of terrain in the study site. The overall accuracy was evaluated by comparing with GPS surveying data. The height offset in the RPC was corrected by estimating bias. In consequence, the accuracy was improved. Accuracy of the DEMs generated from InSAR with different selection of GCP was analyzed. In case of the Ikonos images, the results show that the overall RMSE was 0.23327", 0.l1625" and 13.70m in latitude, longitude and height, respectively. The height accuracy was improved after correcting the height offset in the RPC. i.e., RMSE of the height was 1.02m. As for the SAR image, RMSE of the height was 10.50m with optimal selection of GCP. For the different terrain types, the RMSE of the height for urban, forest and flat area was 23.65m, 8.54m, 0.99m, respectively for Ikonos image while the corresponding RMSE was 13.82m, 18.34m, 10.88m, respectively lot SAR image.

  • PDF

Diagnosis of Alzheimer's Disease using Wrapper Feature Selection Method

  • Vyshnavi Ramineni;Goo-Rak Kwon
    • Smart Media Journal
    • /
    • v.12 no.3
    • /
    • pp.30-37
    • /
    • 2023
  • Alzheimer's disease (AD) symptoms are being treated by early diagnosis, where we can only slow the symptoms and research is still undergoing. In consideration, using T1-weighted images several classification models are proposed in Machine learning to identify AD. In this paper, we consider the improvised feature selection, to reduce the complexity by using wrapping techniques and Restricted Boltzmann Machine (RBM). This present work used the subcortical and cortical features of 278 subjects from the ADNI dataset to identify AD and sMRI. Multi-class classification is used for the experiment i.e., AD, EMCI, LMCI, HC. The proposed feature selection consists of Forward feature selection, Backward feature selection, and Combined PCA & RBM. Forward and backward feature selection methods use an iterative method starting being no features in the forward feature selection and backward feature selection with all features included in the technique. PCA is used to reduce the dimensions and RBM is used to select the best feature without interpreting the features. We have compared the three models with PCA to analysis. The following experiment shows that combined PCA &RBM, and backward feature selection give the best accuracy with respective classification model RF i.e., 88.65, 88.56% respectively.

Feature Selection Based on Bi-objective Differential Evolution

  • Das, Sunanda;Chang, Chi-Chang;Das, Asit Kumar;Ghosh, Arka
    • Journal of Computing Science and Engineering
    • /
    • v.11 no.4
    • /
    • pp.130-141
    • /
    • 2017
  • Feature selection is one of the most challenging problems of pattern recognition and data mining. In this paper, a feature selection algorithm based on an improved version of binary differential evolution is proposed. The method simultaneously optimizes two feature selection criteria, namely, set approximation accuracy of rough set theory and relational algebra based derived score, in order to select the most relevant feature subset from an entire feature set. Superiority of the proposed method over other state-of-the-art methods is confirmed by experimental results, which is conducted over seven publicly available benchmark datasets of different characteristics such as a low number of objects with a high number of features, and a high number of objects with a low number of features.

Energy-balance node-selection algorithm for heterogeneous wireless sensor networks

  • Khan, Imran;Singh, Dhananjay
    • ETRI Journal
    • /
    • v.40 no.5
    • /
    • pp.604-612
    • /
    • 2018
  • To solve the problem of unbalanced loads and the short network lifetime of heterogeneous wireless sensor networks, this paper proposes a node-selection algorithm based on energy balance and dynamic adjustment. The spacing and energy of the nodes are calculated according to the proximity to the network nodes and the characteristics of the link structure. The direction factor and the energy-adjustment factor are introduced to optimize the node-selection probability in order to realize the dynamic selection of network nodes. On this basis, the target path is selected by the relevance of the nodes, and nodes with insufficient energy values are excluded in real time by the establishment of the node-selection mechanism, which guarantees the normal operation of the network and a balanced energy consumption. Simulation results show that this algorithm can effectively extend the network lifetime, and it has better stability, higher accuracy, and an enhanced data-receiving rate in sufficient time.

Feature Combination and Selection Using Genetic Algorithm for Character Recognition (유전 알고리즘을 이용한 특징 결합과 선택)

  • Lee Jin-Seon
    • The Journal of the Korea Contents Association
    • /
    • v.5 no.5
    • /
    • pp.152-158
    • /
    • 2005
  • By using a combination of different feature sets extracted from input character patterns, we can improve the character recognition system performance. To reduce the dimensionality of the combined feature vector, we conduct the feature selection. This paper proposes a general framework for the feature combination and selection for character recognition problems. It also presents a specific design for the handwritten numeral recognition. Tn the design, DDD and AGD feature sets are extracted from handwritten numeral patterns, and a genetic algorithm is used for the feature selection. Experimental result showed a significant accuracy improvement by about 0.7% for the CENPARMI handwrittennumeral database.

  • PDF

Feature Selection Algorithm for Intrusions Detection System using Sequential Forward Search and Random Forest Classifier

  • Lee, Jinlee;Park, Dooho;Lee, Changhoon
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.10
    • /
    • pp.5132-5148
    • /
    • 2017
  • Cyber attacks are evolving commensurate with recent developments in information security technology. Intrusion detection systems collect various types of data from computers and networks to detect security threats and analyze the attack information. The large amount of data examined make the large number of computations and low detection rates problematic. Feature selection is expected to improve the classification performance and provide faster and more cost-effective results. Despite the various feature selection studies conducted for intrusion detection systems, it is difficult to automate feature selection because it is based on the knowledge of security experts. This paper proposes a feature selection technique to overcome the performance problems of intrusion detection systems. Focusing on feature selection, the first phase of the proposed system aims at constructing a feature subset using a sequential forward floating search (SFFS) to downsize the dimension of the variables. The second phase constructs a classification model with the selected feature subset using a random forest classifier (RFC) and evaluates the classification accuracy. Experiments were conducted with the NSL-KDD dataset using SFFS-RF, and the results indicated that feature selection techniques are a necessary preprocessing step to improve the overall system performance in systems that handle large datasets. They also verified that SFFS-RF could be used for data classification. In conclusion, SFFS-RF could be the key to improving the classification model performance in machine learning.

Biological Feature Selection and Disease Gene Identification using New Stepwise Random Forests

  • Hwang, Wook-Yeon
    • Industrial Engineering and Management Systems
    • /
    • v.16 no.1
    • /
    • pp.64-79
    • /
    • 2017
  • Identifying disease genes from human genome is a critical task in biomedical research. Important biological features to distinguish the disease genes from the non-disease genes have been mainly selected based on traditional feature selection approaches. However, the traditional feature selection approaches unnecessarily consider many unimportant biological features. As a result, although some of the existing classification techniques have been applied to disease gene identification, the prediction performance was not satisfactory. A small set of the most important biological features can enhance the accuracy of disease gene identification, as well as provide potentially useful knowledge for biologists or clinicians, who can further investigate the selected biological features as well as the potential disease genes. In this paper, we propose a new stepwise random forests (SRF) approach for biological feature selection and disease gene identification. The SRF approach consists of two stages. In the first stage, only important biological features are iteratively selected in a forward selection manner based on one-dimensional random forest regression, where the updated residual vector is considered as the current response vector. We can then determine a small set of important biological features. In the second stage, random forests classification with regard to the selected biological features is applied to identify disease genes. Our extensive experiments show that the proposed SRF approach outperforms the existing feature selection and classification techniques in terms of biological feature selection and disease gene identification.

Image classification methods applicable multiple satellite imagery

  • Jeong, Jae-Jun;Kim, Kyung-Ok;Lee, Jong-Hun
    • Proceedings of the KSRS Conference
    • /
    • 2002.10a
    • /
    • pp.81-81
    • /
    • 2002
  • Classification is considered as one of the processes of extracting attributes from satellite imagery and is one of the usual functions in the commercial satellite image processing software. Accuracy of classification plays a key role in deciding the usage of its results. Many tremendous efforts far the higher accuracy have been done in such fields; training area selection, classification algorithm. Our research is one of these effort in different manners. In this research, we conduct classification using multiple satellite image data and evidential approach. We statistically consider the posterior probabilities and certainty in maximum likelihood classification and methodologically Dempster's orthogonal sums. Unfortunately, accuracy for the whole data sets has not assessed yet, but accuracy assessments in training fields and check fields shows accuracy improvement over 10% in overall accuracy and over 0.1 in kappa index.

  • PDF