Search | Korea Science

Improving performance of Binary Text Classification Using the EM algorithm (EM 알고리즘을 이용한 이진 분류 문서 범주화의 성능 향상)

한형동;고영중;서정연
- Proceedings of the Korean Information Science Society Conference
- /
- 2004.10a
- /
- pp.790-792
- /
- 2004
문서 범주화에서 이진분류를 다중 분류에 적용할 때, 일반적으로 One-Against-All 방법을 사용한다. 하지만, 이 One-Against-All 방법은 한가지 문제점을 가진다. 즉, positive 집합의 문서들은 사람이 직접 범주를 할당한 것이지만, negative 집합의 문서들은 사람이 직접 범주를 할당한 것이 아니기 때문에 오류 문서들이 포함될 수 있다는 것이다. 본 논문에서는 이러한 문제점을 해결하기 위해 Sliding Window기법과 EM 알고리즘을 이진 분류 기반의 문서 범주화에 적용할 것을 제안한다. 먼저 Sliding Window 기법을 이용하여 학습 데이터로부터 오류 문서들을 추출하고 이 문서들을 EM 알고리즘을 사용해서 다시 범주를 할당함으로써 이진 분류 기반의 문서 범주화 기법의 성능을 향상시킨다.
PDF

Detecting Host-based Intrusion with SVM classification (SVM classification을 이용한 호스트 기반 침입 탐지)

이주이;김동성;박종서;염동복
- Proceedings of the Korea Institutes of Information Security and Cryptology Conference
- /
- 2002.11a
- /
- pp.524-527
- /
- 2002
본 연구에서는 Support Vector Machine(SVM)을 이용한 호스트 기반 침임 탐지 방법을 제안한다. 침입 탐지는 침입과 정상을 판단하는 이진분류 문제이므로 이진분류에 뛰어난 성능을 발휘하는 SVM을 이용하여 침입 탐지 시스템을 구현하였다. 먼저 감사자료를 system call level에서 분석한 후, sliding window기법에 의해 패턴 feature를 추출하고 training set을 구성하였다. 여기에 SVM을 적용하여 decision model을 생성하였고, 이에 대한 판정 테스트 결과 90% 이상의 높은 침입탐지 적중률을 보였다.
PDF

A Study On Filtering of Newspaper Article by Using Bayesian Classifier (베이지안 분류기를 이용한 신문기사 필터링)

손기준;노태길;이상조
- Proceedings of the Korean Information Science Society Conference
- /
- 2002.04b
- /
- pp.490-492
- /
- 2002
본 논문에서는 필터링 문제를 이진 문서 분류 문제로 보고 신문기사 필터링에 베이지안 분류자를 사용한다. 신문 기사 필터링 문제에서 베이지안 분류자를 사용할 경우 학습 문서가 고정되어 있지 않기 때문에 여러 가지 파라미터를 사용하여 실험을 하였다. 실험 결과 베이지안 이진 분류기는 제한된 학습 문서에서 더 나은 성능을 보였고 해당 문서 집합에서 10%이상 비율의 문서를 사용자가 선택해야 함을 알 수 있었다.
PDF

Import Vector Voting Model for Multi-pattern Classification (다중 패턴 분류를 위한 Import Vector Voting 모델)

Choi, Jun-Hyeog;Kim, Dae-Su;Rim, Kee-Wook
- Journal of the Korean Institute of Intelligent Systems
- /
- v.13 no.6
- /
- pp.655-660
- /
- 2003
In general, Support Vector Machine has a good performance in binary classification, but it has the limitation on multi-pattern classification. So, we proposed an Import Vector Voting model for two or more labels classification. This model applied kernel bagging strategy to Import Vector Machine by Zhu. The proposed model used a voting strategy which averaged optimal kernel function from many kernel functions. In experiments, not only binary but multi-pattern classification problems, our proposed Import Vector Voting model showed good performance for given machine learning data.
https://doi.org/10.5391/JKIIS.2003.13.6.655 인용 PDF KSCI

Eigenvoice Adaptation of Classification Model for Binary Mask Estimation (Eigenvoice를 이용한 이진 마스크 분류 모델 적응 방법)

Kim, Gibak
- Journal of Broadcast Engineering
- /
- v.20 no.1
- /
- pp.164-170
- /
- 2015
This paper deals with the adaptation of classification model in the binary mask approach to suppress noise in the noisy environment. The binary mask estimation approach is known to improve speech intelligibility of noisy speech. However, the same type of noisy data for the test data should be included in the training data for building the classification model of binary mask estimation. The eigenvoice adaptation is applied to the noise-independent classification model and the adapted model is used as noise-dependent model. The results are reported in Hit rates and False alarm rates. The experimental results confirmed that the accuracy of classification is improved as the number of adaptation sentences increases.
https://doi.org/10.5909/JBE.2015.20.1.164 인용 PDF KSCI KPUBS HTML

A Text Categorization Method Improved by Removing Noisy Training Documents (오류 학습 문서 제거를 통한 문서 범주화 기법의 성능 향상)

Han, Hyoung-Dong;Ko, Young-Joong;Seo, Jung-Yun
- Journal of KIISE:Software and Applications
- /
- v.32 no.9
- /
- pp.912-919
- /
- 2005
When we apply binary classification to multi-class classification for text categorization, we use the One-Against-All method generally, However, this One-Against-All method has a problem. That is, documents of a negative set are not labeled by human. Thus, they can include many noisy documents in the training data. In this paper, we propose that the Sliding Window technique and the EM algorithm are applied to binary text classification for solving this problem. We here improve binary text classification through extracting noise documents from the training data by the Sliding Window technique and re-assigning categories of these documents using the EM algorithm.
PDF KSCI

CNN-based Android Malware Detection Using Reduced Feature Set

Kim, Dong-Min;Lee, Soo-jin
- Journal of the Korea Society of Computer and Information
- /
- v.26 no.10
- /
- pp.19-26
- /
- 2021
The performance of deep learning-based malware detection and classification models depends largely on how to construct a feature set to be applied to training. In this paper, we propose an approach to select the optimal feature set to maximize detection performance for CNN-based Android malware detection. The features to be included in the feature set were selected through the Chi-Square test algorithm, which is widely used for feature selection in machine learning and deep learning. To validate the proposed approach, the CNN model was trained using 36 characteristics selected for the CICANDMAL2017 dataset and then the malware detection performance was measured. As a result, 99.99% of Accuracy was achieved in binary classification and 98.55% in multiclass classification.
https://doi.org/10.9708/jksci.2021.26.10.019 인용 PDF KSCI HTML

Binary Classifier Construction for U87 Cell Shapes using Fourier Shape Descriptor and SVM (퓨리에 형태표현자와 SVM 을 이용한 U87 세포의 형태학적 분류기 모델구축)

Kang, Mi-Sun;Kim, Jeong-Sik;Kim, Myoung-Hee
- Proceedings of the Korea Information Processing Society Conference
- /
- 2010.11a
- /
- pp.751-753
- /
- 2010
본 논문에서는 위상차 현미경 영상 내 U87 세포의 정확한 형태학적 분류를 위한 이진 분류기 구축 방법을 제안한다. 본 방법은 Fourier descriptor 기반 세포형상 표현을 SVM 이진분류기 구축에 사용함으로써 분류 대상인 원추형과 원형세포에 대해 영상 내 세포의 위치와 회전, 크기의 변화에 대해 강인한 분류성능을 제공한다. 본 실험을 통해 polynomial 커널에서 학습된 SVM 분류기가 linear, RBF, sigmoid 에 비교하여 가장 정확한 분류 성능을 보임을 확인하였다. 본 연구는 논문상 기준인 두 종류의 세포 형태 분류기를 기반 프레임워크로 삼아 좀더 다양한 세포 형태를 분류할 수 있도록 개선된다면 악성뇌종양의 전이억제치료에 효과적인 전이행동분석에 도움을 줄 수 있을 것으로 기대된다.
https://doi.org/10.3745/PKIPS.y2010m11a.751 인용 PDF

A Two-Dimensional Binary Prefix Tree for Packet Classification (패킷 분류를 위한 이차원 이진 프리픽스 트리)

Jung, Yeo-Jin;Kim, Hye-Ran;Lim, Hye-Sook
- Journal of KIISE:Information Networking
- /
- v.32 no.4
- /
- pp.543-550
- /
- 2005
Demand for better services in the Internet has been increasing due to the rapid growth of the Internet, and hence next generation routers are required to perform intelligent packet classification. For a given classifier defining packet attributes or contents, packet classification is the process of identifying the highest priority rule to which a packet conforms. A notable characteristic of real classifiers is that a packet matches only a small number of distinct source-destination prefix pairs. Therefore, a lot of schemes have been proposed to filter rules based on source and destination prefix pairs. However, most of the schemes are based on sequential one-dimensional searches using trio which requires huge memory. In this paper, we proposea memory-efficient two-dimensional search scheme using source and destination prefix pairs. By constructing binary prefix tree, source prefix search and destination prefix search are simultaneously performed in a binary tree. Moreover, the proposed two-dimensional binary prefix tree does not include any empty internal nodes, and hence memory waste of previous trio-based structures is completely eliminated.
PDF KSCI

Comparative Analysis of the Binary Classification Model for Improving PM10 Prediction Performance (PM10 예측 성능 향상을 위한 이진 분류 모델 비교 분석)

Jung, Yong-Jin;Lee, Jong-Sung;Oh, Chang-Heon
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.25 no.1
- /
- pp.56-62
- /
- 2021
High forecast accuracy is required as social issues on particulate matter increase. Therefore, many attempts are being made using machine learning to increase the accuracy of particulate matter prediction. However, due to problems with the distribution of imbalance in the concentration and various characteristics of particulate matter, the learning of prediction models is not well done. In this paper, to solve these problems, a binary classification model was proposed to predict the concentration of particulate matter needed for prediction by dividing it into two classes based on the value of 80㎍/㎥. Four classification algorithms were utilized for the binary classification of PM10. Classification algorithms used logistic regression, decision tree, SVM, and MLP. As a result of performance evaluation through confusion matrix, the MLP model showed the highest binary classification performance with 89.98% accuracy among the four models.
https://doi.org/10.6109/jkiice.2021.25.1.56 인용 PDF KSCI

Search Result 607, Processing Time 0.033 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)