• 제목/요약/키워드: top-k classification

검색결과 160건 처리시간 0.023초

Naive Bayes classifiers boosted by sufficient dimension reduction: applications to top-k classification

  • Yang, Su Hyeong;Shin, Seung Jun;Sung, Wooseok;Lee, Choon Won
    • Communications for Statistical Applications and Methods
    • /
    • 제29권5호
    • /
    • pp.603-614
    • /
    • 2022
  • The naive Bayes classifier is one of the most straightforward classification tools and directly estimates the class probability. However, because it relies on the independent assumption of the predictor, which is rarely satisfied in real-world problems, its application is limited in practice. In this article, we propose employing sufficient dimension reduction (SDR) to substantially improve the performance of the naive Bayes classifier, which is often deteriorated when the number of predictors is not restrictively small. This is not surprising as SDR reduces the predictor dimension without sacrificing classification information, and predictors in the reduced space are constructed to be uncorrelated. Therefore, SDR leads the naive Bayes to no longer be naive. We applied the proposed naive Bayes classifier after SDR to build a recommendation system for the eyewear-frames based on customers' face shape, demonstrating its utility in the top-k classification problem.

Convolutional Neural Network 기반의 악성코드 이미지화를 통한 패밀리 분류 (Visualized Malware Classification Based-on Convolutional Neural Network)

  • 석선희;김호원
    • 정보보호학회논문지
    • /
    • 제26권1호
    • /
    • pp.197-208
    • /
    • 2016
  • 본 논문에서는 악성코드를 실행시키지 않고 패밀리를 분류하는 방법으로 악성 코드 파일을 8-bit gray-scale 이미지로 시각화 하고 이미지 인식분야에서 널리 쓰이고 있는 convolutional neural network를 통해 악성코드를 분류해내는 기법을 제안한다. 9개의 악성코드 패밀리로 분류해 내는 실험의 Top-1,2 예측 정확도는 각각 96.2%, 98.7%을 기록하였고, 27개의 패밀리를 분류하는 실험의 경우 Top-1 예측 정확도는 82.9%, Top-2는 89%로 악성코드 패밀리를 분류할 수 있다.

Top-down 방식의 열분해질량분석 스펙트라 분석 및 Gram-type 세균 분류 (Analysis of Pyrolysis MS Spectra in Top-down Approach and Differentiation of Gram-type Cells)

  • 김주현
    • 한국군사과학기술학회지
    • /
    • 제14권4호
    • /
    • pp.719-725
    • /
    • 2011
  • To apply TMAH-based Py-MS to a field biological detection system for real-time classification of cell-type, reproducible patterns of the TMAH-based Py-MS spectra was known as a critical factor for classification but was seriously disturbed by quantity of cells injected into pyro-tube. This factor is an exterior variable that could not be complemented by improving the performance of the TMAH-based Py-MS instrument. One of idea to solve the knotty problem has been flashed from "Top-down proteomics for identification of intact microoganisms". That is, biomarker peaks are selected from complicate Py-MS spectra for intact microoganisms by tracing out their origins, based on Py-MS spectra for the featured components of different cell-types, in Top-down approach. This idea has been tested in classification of different Gram-type microoganisms. Through the analyses of spectra for the featured components - peptidoglycan and lipoteichoic acid for Gram-positive cells and lipopolysaccharide and lipid A for Gram-negative cells - with comparing to the spectra the corresponding Gram-type cells in the Top-down approach, biomarker peaks were selected to carry out PCA(Principal Component Analysis) in order to see classification of different Gram-types, resulting in significant improvement of their classification. Furthermore, weighting biomarker peaks on intact cell's spectra, based on the data for the featured components of the Gram-types, contributed to elevate classification performance.

EFTG: Efficient and Flexible Top-K Geo-textual Publish/Subscribe

  • zhu, Hong;Li, Hongbo;Cui, Zongmin;Cao, Zhongsheng;Xie, Meiyi
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제12권12호
    • /
    • pp.5877-5897
    • /
    • 2018
  • With the popularity of mobile networks and smartphones, geo-textual publish/subscribe messaging has attracted wide attention. Different from the traditional publish/subscribe format, geo-textual data is published and subscribed in the form of dynamic data flow in the mobile network. The difference creates more requirements for efficiency and flexibility. However, most of the existing Top-k geo-textual publish/subscribe schemes have the following deficiencies: (1) All publications have to be scored for each subscription, which is not efficient enough. (2) A user should take time to set a threshold for each subscription, which is not flexible enough. Therefore, we propose an efficient and flexible Top-k geo-textual publish/subscribe scheme. First, our scheme groups publish and subscribe based on text classification. Thus, only a few parts of related publications should be scored for each subscription, which significantly enhances efficiency. Second, our scheme proposes an adaptive publish/subscribe matching algorithm. The algorithm does not require the user to set a threshold. It can adaptively return Top-k results to the user for each subscription, which significantly enhances flexibility. Finally, theoretical analysis and experimental evaluation verify the efficiency and effectiveness of our scheme.

A Decision Tree Algorithm using Genetic Programming

  • Park, Chongsun;Ko, Young Kyong
    • Communications for Statistical Applications and Methods
    • /
    • 제10권3호
    • /
    • pp.845-857
    • /
    • 2003
  • We explore the use of genetic programming to evolve decision trees directly for classification problems with both discrete and continuous predictors. We demonstrate that the derived hypotheses of standard algorithms can substantially deviated from the optimum. This deviation is partly due to their top-down style procedures. The performance of the system is measured on a set of real and simulated data sets and compared with the performance of well-known algorithms like CHAID, CART, C5.0, and QUEST. Proposed algorithm seems to be effective in handling problems caused by top-down style procedures of existing algorithms.

A Feature Selection-based Ensemble Method for Arrhythmia Classification

  • Namsrai, Erdenetuya;Munkhdalai, Tsendsuren;Li, Meijing;Shin, Jung-Hoon;Namsrai, Oyun-Erdene;Ryu, Keun Ho
    • Journal of Information Processing Systems
    • /
    • 제9권1호
    • /
    • pp.31-40
    • /
    • 2013
  • In this paper, a novel method is proposed to build an ensemble of classifiers by using a feature selection schema. The feature selection schema identifies the best feature sets that affect the arrhythmia classification. Firstly, a number of feature subsets are extracted by applying the feature selection schema to the original dataset. Then classification models are built by using the each feature subset. Finally, we combine the classification models by adopting a voting approach to form a classification ensemble. The voting approach in our method involves both classification error rate and feature selection rate to calculate the score of the each classifier in the ensemble. In our method, the feature selection rate depends on the extracting order of the feature subsets. In the experiment, we applied our method to arrhythmia dataset and generated three top disjointed feature sets. We then built three classifiers based on the top-three feature subsets and formed the classifier ensemble by using the voting approach. Our method can improve the classification accuracy in high dimensional dataset. The performance of each classifier and the performance of their ensemble were higher than the performance of the classifier that was based on whole feature space of the dataset. The classification performance was improved and a more stable classification model could be constructed with the proposed approach.

딥러닝을 이용한 인스타그램 이미지 분류 (Instagram image classification with Deep Learning)

  • 정노권;조수선
    • 인터넷정보학회논문지
    • /
    • 제18권5호
    • /
    • pp.61-67
    • /
    • 2017
  • 본 논문에서는 딥러닝의 회선신경망을 이용한 실제 소셜 네트워크 상의 이미지 분류가 얼마나 효과적인지 알아보기 위한 실험을 수행하고, 그 결과와 그를 통해 알게 된 교훈에 대해 소개한다. 이를 위해 ImageNet Large Scale Visual Recognition Challenge(ILSVRC)의 2012년 대회와 2015년 대회에서 각각 우승을 차지한 AlexNet 모델과 ResNet 모델을 이용하였다. 평가를 위한 테스트 셋으로 인스타그램에서 수집한 이미지를 사용하였으며, 12개의 카테고리, 총 240개의 이미지로 구성되어 있다. 또한, Inception V3모델을 이용하여 fine-tuning을 실시하고, 그 결과를 비교하였다. AlexNet과 ResNet, Inception V3, fine-tuned Inception V3 이 네 가지 모델에 대한 Top-1 error rate들은 각각 49.58%, 40.42%, 30.42% 그리고 5.00%로 나타났으며, Top-5 error rate들은 각각 35.42%, 25.00%, 20.83% 그리고 0.00%로 나타났다.

냉연 표면흠 검사 알고리듬 개발에 관한 연구 (Development of surface defect inspection algorithms for cold mill strip)

  • 김경민;박귀태;박중조;이종학;정진양;이주강
    • 제어로봇시스템학회논문지
    • /
    • 제3권2호
    • /
    • pp.179-186
    • /
    • 1997
  • In this paper we suggest a development of surface defect inspection algorithms for cold mill strip. The defects which exist in a surface of cold mill strip have a scattering or singular distribution. This paper consists of preprocessing, feature extraction and defect classification. By preprocessing, the binarized defect image is achieved. In this procedure, Top-hit transform, adaptive thresholding, thinning and noise rejection are used. Especially, Top-hit transform using local min/max operation diminishes the effect of bad lighting. In feature extraction, geometric, moment and co-occurrence matrix features are calculated. For the defect classification, multilayer neural network is used. The proposed algorithm showed 15% error rate.

  • PDF

간호중재분류 (NIC)에 근거한 부인과 간호단위의 간호중재 분석 (Analysis of Nursing Interventions Performed by Gynecological Nursing Unit Nurses Using the Nursing Interventions Classification)

  • 홍성정;이성희;김화선
    • 여성건강간호학회지
    • /
    • 제17권3호
    • /
    • pp.275-284
    • /
    • 2011
  • Purpose: The purpose of this study was to identify nursing intervention performed by nurses on gynecological nursing units. Methods: The instrument in this study is based on the fifth edition of Nursing Interventions Classification (NIC) (2008). Data was collected by Electronic Medical record from August, 2010 to October, 2010 at one hospital and analyzed by using frequencies in the Microsoft Excel 2010 program. Results: Of a total of 82 NIC, domains of the nursing interventions showed higher percentages for physiological: basic (36.3%) and physiological: complex (34.5%). The classes of nursing interventions showed higher percentage for health system medication (12.1%), perioperative care (10.0%), and drug management (8.6%). The most frequently used top interventions were Discharge Planning. The thirty least used interventions was environmental management. Top thirty most frequently used interventions belonged to the domain of physiological: basic (37.9%), physiological: complex (31.1%), and behavioral (5.4%). Conclusion: These findings will help in the establishment of a standardized language for gynecological nursing units and enhance the quality of nursing care.

Variations of AlexNet and GoogLeNet to Improve Korean Character Recognition Performance

  • Lee, Sang-Geol;Sung, Yunsick;Kim, Yeon-Gyu;Cha, Eui-Young
    • Journal of Information Processing Systems
    • /
    • 제14권1호
    • /
    • pp.205-217
    • /
    • 2018
  • Deep learning using convolutional neural networks (CNNs) is being studied in various fields of image recognition and these studies show excellent performance. In this paper, we compare the performance of CNN architectures, KCR-AlexNet and KCR-GoogLeNet. The experimental data used in this paper is obtained from PHD08, a large-scale Korean character database. It has 2,187 samples of each Korean character with 2,350 Korean character classes for a total of 5,139,450 data samples. In the training results, KCR-AlexNet showed an accuracy of over 98% for the top-1 test and KCR-GoogLeNet showed an accuracy of over 99% for the top-1 test after the final training iteration. We made an additional Korean character dataset with fonts that were not in PHD08 to compare the classification success rate with commercial optical character recognition (OCR) programs and ensure the objectivity of the experiment. While the commercial OCR programs showed 66.95% to 83.16% classification success rates, KCR-AlexNet and KCR-GoogLeNet showed average classification success rates of 90.12% and 89.14%, respectively, which are higher than the commercial OCR programs' rates. Considering the time factor, KCR-AlexNet was faster than KCR-GoogLeNet when they were trained using PHD08; otherwise, KCR-GoogLeNet had a faster classification speed.