• Title/Summary/Keyword: feature-scoring

검색결과 40건 처리시간 0.031초

A Clustering Approach for Feature Selection in Microarray Data Classification Using Random Forest

  • Aydadenta, Husna;Adiwijaya, Adiwijaya
    • Journal of Information Processing Systems
    • /
    • 제14권5호
    • /
    • pp.1167-1175
    • /
    • 2018
  • Microarray data plays an essential role in diagnosing and detecting cancer. Microarray analysis allows the examination of levels of gene expression in specific cell samples, where thousands of genes can be analyzed simultaneously. However, microarray data have very little sample data and high data dimensionality. Therefore, to classify microarray data, a dimensional reduction process is required. Dimensional reduction can eliminate redundancy of data; thus, features used in classification are features that only have a high correlation with their class. There are two types of dimensional reduction, namely feature selection and feature extraction. In this paper, we used k-means algorithm as the clustering approach for feature selection. The proposed approach can be used to categorize features that have the same characteristics in one cluster, so that redundancy in microarray data is removed. The result of clustering is ranked using the Relief algorithm such that the best scoring element for each cluster is obtained. All best elements of each cluster are selected and used as features in the classification process. Next, the Random Forest algorithm is used. Based on the simulation, the accuracy of the proposed approach for each dataset, namely Colon, Lung Cancer, and Prostate Tumor, achieved 85.87%, 98.9%, and 89% accuracy, respectively. The accuracy of the proposed approach is therefore higher than the approach using Random Forest without clustering.

An Efficient Goal Area Detection Method in Soccer Game Video (축구경기 동영상에서의 효율적인 골영역 검출 방법)

  • 우성형;전승철;박성한
    • Proceedings of the IEEK Conference
    • /
    • 대한전자공학회 2000년도 추계종합학술대회 논문집(3)
    • /
    • pp.81-84
    • /
    • 2000
  • In this paper, we propose an efficient method to extract a goal area which may be closely related to the scoring highlight. In our method, the boundary between the ground and the non-ground area is used. An efficient methods for a rapid detection of both the boundary and then the goal area are proposed. Our simulation results show that our method is very reliable and takes less processing time compared with previous methods. This performance improvements may be caused by the use of a general simple feature.

  • PDF

The Type of English Writing Error of Korean Undergraduate Students (한국 대학생이 보이는 영어작문 실수 유형)

  • Lim Heesuck;Park Chongwon;Nam Kichun
    • Proceedings of the KSPS conference
    • /
    • 대한음성학회 2003년도 5월 학술대회지
    • /
    • pp.176-179
    • /
    • 2003
  • This study was conducted to extract the feature set of English writing error for suggesting adequate English writing program and making automated scoring system. The frequent committed error and the error across the level of writing proficiency were reported. Also, It is reported that the correlation between type of error and native speaker's rating score.

  • PDF

Multi-stage Speech Recognition Using Confidence Vector (신뢰도 벡터 기반의 다단계 음성인식)

  • Jeon, Hyung-Bae;Hwang, Kyu-Woong;Chung, Hoon;Kim, Seung-Hi;Park, Jun;Lee, Yun-Keun
    • MALSORI
    • /
    • 제63호
    • /
    • pp.113-124
    • /
    • 2007
  • In this paper, we propose a use of confidence vector as an intermediate input feature for multi-stage based speech recognition architecture to improve recognition accuracy. A multi-stage speech recognition structure is introduced as a method to reduce the computational complexity of the decoding procedure and then accomplish faster speech recognition. Conventional multi-stage speech recognition is usually composed of three stages, acoustic search, lexical search, and acoustic re-scoring. In this paper, we focus on improving the accuracy of the lexical decoding by introducing a confidence vector as an input feature instead of phoneme which was used typically. We take experimental results on 220K Korean Point-of-Interest (POI) domain and the experimental results show that the proposed method contributes on improving accuracy.

  • PDF

Content-Based Image Retrieval Based on Relevance Feedback and Reinforcement Learning for Medical Images

  • Lakdashti, Abolfazl;Ajorloo, Hossein
    • ETRI Journal
    • /
    • 제33권2호
    • /
    • pp.240-250
    • /
    • 2011
  • To enable a relevance feedback paradigm to evolve itself by users' feedback, a reinforcement learning method is proposed. The feature space of the medical images is partitioned into positive and negative hypercubes by the system. Each hypercube constitutes an individual in a genetic algorithm infrastructure. The rules take recombination and mutation operators to make new rules for better exploring the feature space. The effectiveness of the rules is checked by a scoring method by which the ineffective rules will be omitted gradually and the effective ones survive. Our experiments on a set of 10,004 images from the IRMA database show that the proposed approach can better describe the semantic content of images for image retrieval with respect to other existing approaches in the literature.

A Hybrid Recommendation System based on Fuzzy C-Means Clustering and Supervised Learning

  • Duan, Li;Wang, Weiping;Han, Baijing
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제15권7호
    • /
    • pp.2399-2413
    • /
    • 2021
  • A recommendation system is an information filter tool, which uses the ratings and reviews of users to generate a personalized recommendation service for users. However, the cold-start problem of users and items is still a major research hotspot on service recommendations. To address this challenge, this paper proposes a high-efficient hybrid recommendation system based on Fuzzy C-Means (FCM) clustering and supervised learning models. The proposed recommendation method includes two aspects: on the one hand, FCM clustering technique has been applied to the item-based collaborative filtering framework to solve the cold start problem; on the other hand, the content information is integrated into the collaborative filtering. The algorithm constructs the user and item membership degree feature vector, and adopts the data representation form of the scoring matrix to the supervised learning algorithm, as well as by combining the subjective membership degree feature vector and the objective membership degree feature vector in a linear combination, the prediction accuracy is significantly improved on the public datasets with different sparsity. The efficiency of the proposed system is illustrated by conducting several experiments on MovieLens dataset.

Lack of Correlations among Histopathological Parameters, Ki-67 Proliferation Index and Prognosis in Pheochromocytoma Patients

  • Ocal, Irfan;Avci, Arzu;Cakalagaoglu, Fulya;Can, Huseyin
    • Asian Pacific Journal of Cancer Prevention
    • /
    • 제15권4호
    • /
    • pp.1751-1755
    • /
    • 2014
  • Background: In this study prognostic correlations of histopathologic parameters and the Ki-67 proliferation index and as well as the diagnostic value of immunohistochemical markers in pheochromocytomas were evaluated. Materials and Methods: A total of 22 patients diagnosed with a pheochromocytoma between 2000-2010 in Izmir Katip Celebi University Ataturk Training and Research Hospital were included. Diagnostic value of the PASS scoring system, and prognostic correlations of histopathologic parameters and Ki-67 proliferation index were investigated. SPSS for Windows 17.0 software was used for statistical analysis. Results: There was no statistically significant correlation between recurrence and clinicopathologic parameters or the PASS score (PASS>4). In addition, there were no statistically significant correlations between PASS score and clinicopathologic parameters, such as diameter (5 cm), weight (>100g), gender (female/male ratio) and age (25-45/45-55/>55). Besides, there were no significant correlation between diameter and clinicopathological parameters and also recurrence. However, there was a statistically significant correlation between Ki-67 proliferation index and capsule invasion (p=0.047). Conclusions: Some but not most of the findings in our study were concordant with the literature. To clarify relationships, investigations with standard scoring systems which are not affected by subjective factors and feature appropriate histopathological criteria should be made on larger study groups.

Automated Scoring of Scientific Argumentation Using Expert Morpheme Classification Approaches (전문가의 형태소 분류를 활용한 과학 논증 자동 채점)

  • Lee, Manhyoung;Ryu, Suna
    • Journal of The Korean Association For Science Education
    • /
    • 제40권3호
    • /
    • pp.321-336
    • /
    • 2020
  • We explore automated scoring models of scientific argumentation. We consider how a new analytical approach using a machine learning technique may enhance the understanding of spoken argumentation in the classroom. We sampled 2,605 utterances that occurred during a high school student's science class on molecular structure and classified the utterances into five argumentative elements. Next, we performed Text Preprocessing for the classified utterances. As machine learning techniques, we applied support vector machines, decision tree, random forest, and artificial neural network. For enhancing the identification of rebuttal elements, we used a heuristic feature-engineering method that applies experts' classification of morphemes of scientific argumentation.

Tracing the breeding farm of domesticated pig using feature selection (Sus scrofa)

  • Kwon, Taehyung;Yoon, Joon;Heo, Jaeyoung;Lee, Wonseok;Kim, Heebal
    • Asian-Australasian Journal of Animal Sciences
    • /
    • 제30권11호
    • /
    • pp.1540-1549
    • /
    • 2017
  • Objective: Increasing food safety demands in the animal product market have created a need for a system to trace the food distribution process, from the manufacturer to the retailer, and genetic traceability is an effective method to trace the origin of animal products. In this study, we successfully achieved the farm tracing of 6,018 multi-breed pigs, using single nucleotide polymorphism (SNP) markers strictly selected through least absolute shrinkage and selection operator (LASSO) feature selection. Methods: We performed farm tracing of domesticated pig (Sus scrofa) from SNP markers and selected the most relevant features for accurate prediction. Considering multi-breed composition of our data, we performed feature selection using LASSO penalization on 4,002 SNPs that are shared between breeds, which also includes 179 SNPs with small between-breed difference. The 100 highest-scored features were extracted from iterative simulations and then evaluated using machine-leaning based classifiers. Results: We selected 1,341 SNPs from over 45,000 SNPs through iterative LASSO feature selection, to minimize between-breed differences. We subsequently selected 100 highest-scored SNPs from iterative scoring, and observed high statistical measures in classification of breeding farms by cross-validation only using these SNPs. Conclusion: The study represents a successful application of LASSO feature selection on multi-breed pig SNP data to trace the farm information, which provides a valuable method and possibility for further researches on genetic traceability.

Seabed Sediment Feature Extraction Algorithm using Attenuation Coefficient Variation According to Frequency (주파수에 따른 감쇠계수 변화량을 이용한 해저 퇴적물 특징 추출 알고리즘)

  • Lee, Kibae;Kim, Juho;Lee, Chong Hyun;Bae, Jinho;Lee, Jaeil;Cho, Jung Hong
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • 제54권1호
    • /
    • pp.111-120
    • /
    • 2017
  • In this paper, we propose novel feature extraction algorithm for classification of seabed sediment. In previous researches, acoustic reflection coefficient has been used to classify seabed sediments, which is constant in terms of frequency. However, attenuation of seabed sediment is a function of frequency and is highly influenced by sediment types in general. Hence, we developed a feature vector by using attenuation variation with respect to frequency. The attenuation variation is obtained by using reflected signal from the second sediment layer, which is generated by broadband chirp. The proposed feature vector has advantage in number of dimensions to classify the seabed sediment over the classical scalar feature (reflection coefficient). To compare the proposed feature with the classical scalar feature, dimension of proposed feature vector is reduced by using linear discriminant analysis (LDA). Synthesised acoustic amplitudes reflected by seabed sediments are generated by using Biot model and the performance of proposed feature is evaluated by using Fisher scoring and classification accuracy computed by maximum likelihood decision (MLD). As a result, the proposed feature shows higher discrimination performance and more robustness against measurement errors than that of classical feature.