통합 검색 | Korea Science

A Novel Feature Selection Method in the Categorization of Imbalanced Textual Data

Pouramini, Jafar;Minaei-Bidgoli, Behrouze;Esmaeili, Mahdi
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- 제12권8호
- /
- pp.3725-3748
- /
- 2018
Text data distribution is often imbalanced. Imbalanced data is one of the challenges in text classification, as it leads to the loss of performance of classifiers. Many studies have been conducted so far in this regard. The proposed solutions are divided into several general categories, include sampling-based and algorithm-based methods. In recent studies, feature selection has also been considered as one of the solutions for the imbalance problem. In this paper, a novel one-sided feature selection known as probabilistic feature selection (PFS) was presented for imbalanced text classification. The PFS is a probabilistic method that is calculated using feature distribution. Compared to the similar methods, the PFS has more parameters. In order to evaluate the performance of the proposed method, the feature selection methods including Gini, MI, FAST and DFS were implemented. To assess the proposed method, the decision tree classifications such as C4.5 and Naive Bayes were used. The results of tests on Reuters-21875 and WebKB figures per F-measure suggested that the proposed feature selection has significantly improved the performance of the classifiers.
https://doi.org/10.3837/tiis.2018.08.010 인용 PDF KSCI

A Comprehensive Approach for Tamil Handwritten Character Recognition with Feature Selection and Ensemble Learning

Manoj K;Iyapparaja M
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- 제18권6호
- /
- pp.1540-1561
- /
- 2024
This research proposes a novel approach for Tamil Handwritten Character Recognition (THCR) that combines feature selection and ensemble learning techniques. The Tamil script is complex and highly variable, requiring a robust and accurate recognition system. Feature selection is used to reduce dimensionality while preserving discriminative features, improving classification performance and reducing computational complexity. Several feature selection methods are compared, and individual classifiers (support vector machines, neural networks, and decision trees) are evaluated through extensive experiments. Ensemble learning techniques such as bagging, and boosting are employed to leverage the strengths of multiple classifiers and enhance recognition accuracy. The proposed approach is evaluated on the HP Labs Dataset, achieving an impressive 95.56% accuracy using an ensemble learning framework based on support vector machines. The dataset consists of 82,928 samples with 247 distinct classes, contributed by 500 participants from Tamil Nadu. It includes 40,000 characters with 500 user variations. The results surpass or rival existing methods, demonstrating the effectiveness of the approach. The research also offers insights for developing advanced recognition systems for other complex scripts. Future investigations could explore the integration of deep learning techniques and the extension of the proposed approach to other Indic scripts and languages, advancing the field of handwritten character recognition.
https://doi.org/10.3837/tiis.2024.06.007 인용 PDF HTML

A Classifiable Sub-Flow Selection Method for Traffic Classification in Mobile IP Networks

Satoh, Akihiro;Osada, Toshiaki;Abe, Toru;Kitagata, Gen;Shiratori, Norio;Kinoshita, Tetsuo
- Journal of Information Processing Systems
- /
- 제6권3호
- /
- pp.307-322
- /
- 2010
Traffic classification is an essential task for network management. Many researchers have paid attention to initial sub-flow features based classifiers for traffic classification. However, the existing classifiers cannot classify traffic effectively in mobile IP networks. The classifiers depend on initial sub-flows, but they cannot always capture the sub-flows at a point of attachment for a variety of elements because of seamless mobility. Thus the ideal classifier should be capable of traffic classification based on not only initial sub-flows but also various types of sub-flows. In this paper, we propose a classifiable sub-flow selection method to realize the ideal classifier. The experimental results are so far promising for this research direction, even though they are derived from a reduced set of general applications and under relatively simplifying assumptions. Altogether, the significant contribution is indicating the feasibility of the ideal classifier by selecting not only initial sub-flows but also transition sub-flows.
https://doi.org/10.3745/JIPS.2010.6.3.307 인용 PDF KSCI

초음파신호의 신경망 형상인식법을 이용한 오스테나이트 스테인레스강의 용접부결함 분류에 관한 연구 (Classification of Welding Defects in Austenitic Stainless Steel by Neural Pattern Recognition of Ultrasonic Signal)

이강용;김준섭
- 대한기계학회논문집A
- /
- 제20권4호
- /
- pp.1309-1319
- /
- 1996
The research for the classification of the natural defects in welding zone is performd using the neuro-pattern recognition technology. The signal pattern recognition package including the user's defined function is developed to perform the digital signal processing, feature extraction, feature selection and classifier selection, The neural network classifier and the statistical classifiers such as the linear discriminant function classifier and the empirical Bayesian calssifier are compared and discussed. The neuro-pattern recognition technique is applied to the classificaiton of such natural defects as root crack, incomplete penetration, lack of fusion, slag inclusion, porosity, etc. If appropriately learned, the neural network classifier is concluded to be better than the statistical classifiers in the classification of the natural welding defects.
https://doi.org/10.22634/KSME-A.1996.20.4.1309 인용 PDF

Classification for Imbalanced Breast Cancer Dataset Using Resampling Methods

Hana Babiker, Nassar
- International Journal of Computer Science & Network Security
- /
- 제23권1호
- /
- pp.89-95
- /
- 2023
Analyzing breast cancer patient files is becoming an exciting area of medical information analysis, especially with the increasing number of patient files. In this paper, breast cancer data is collected from Khartoum state hospital, and the dataset is classified into recurrence and no recurrence. The data is imbalanced, meaning that one of the two classes have more sample than the other. Many pre-processing techniques are applied to classify this imbalanced data, resampling, attribute selection, and handling missing values, and then different classifiers models are built. In the first experiment, five classifiers (ANN, REP TREE, SVM, and J48) are used, and in the second experiment, meta-learning algorithms (Bagging, Boosting, and Random subspace). Finally, the ensemble model is used. The best result was obtained from the ensemble model (Boosting with J48) with the highest accuracy 95.2797% among all the algorithms, followed by Bagging with J48(90.559%) and random subspace with J48(84.2657%). The breast cancer imbalanced dataset was classified into recurrence, and no recurrence with different classified algorithms and the best result was obtained from the ensemble model.
https://doi.org/10.22937/IJCSNS.2023.23.1.12 인용 PDF

Statistical Speech Feature Selection for Emotion Recognition

Kwon Oh-Wook;Chan Kwokleung;Lee Te-Won
- The Journal of the Acoustical Society of Korea
- /
- 제24권4E호
- /
- pp.144-151
- /
- 2005
We evaluate the performance of emotion recognition via speech signals when a plain speaker talks to an entertainment robot. For each frame of a speech utterance, we extract the frame-based features: pitch, energy, formant, band energies, mel frequency cepstral coefficients (MFCCs), and velocity/acceleration of pitch and MFCCs. For discriminative classifiers, a fixed-length utterance-based feature vector is computed from the statistics of the frame-based features. Using a speaker-independent database, we evaluate the performance of two promising classifiers: support vector machine (SVM) and hidden Markov model (HMM). For angry/bored/happy/neutral/sad emotion classification, the SVM and HMM classifiers yield $42.3\%\;and\;40.8\%$ accuracy, respectively. We show that the accuracy is significant compared to the performance by foreign human listeners.
PDF KSCI

Multiple Moving Person Tracking Based on the IMPRESARIO Simulator

Kim, Hyun-Deok;Jin, Tae-Seok
- Journal of information and communication convergence engineering
- /
- 제6권3호
- /
- pp.331-336
- /
- 2008
In this paper, we propose a real-time people tracking system with multiple CCD cameras for security inside the building. To achieve this goal, we present a method for 3D walking human tracking based on the IMPRESARIO framework incorporating cascaded classifiers into hypothesis evaluation. The efficiency of adaptive selection of cascaded classifiers has been also presented. The camera is mounted from the ceiling of the laboratory so that the image data of the passing people are fully overlapped. The implemented system recognizes people movement along various directions. To track people even when their images are partially overlapped, the proposed system estimates and tracks a bounding box enclosing each person in the tracking region. The approximated convex hull of each individual in the tracking area is obtained to provide more accurate tracking information. We have shown the improvement of reliability for likelihood calculation by using cascaded classifiers. Experimental results show that the proposed method can smoothly and effectively detect and track walking humans through environments such as dense forests.
PDF KSCI

3D Walking Human Detection and Tracking based on the IMPRESARIO Framework

Jin, Tae-Seok;Hashimoto, Hideki
- International Journal of Fuzzy Logic and Intelligent Systems
- /
- 제8권3호
- /
- pp.163-169
- /
- 2008
In this paper, we propose a real-time people tracking system with multiple CCD cameras for security inside the building. The camera is mounted from the ceiling of the laboratory so that the image data of the passing people are fully overlapped. The implemented system recognizes people movement along various directions. To track people even when their images are partially overlapped, the proposed system estimates and tracks a bounding box enclosing each person in the tracking region. The approximated convex hull of each individual in the tracking area is obtained to provide more accurate tracking information. To achieve this goal, we propose a method for 3D walking human tracking based on the IMPRESARIO framework incorporating cascaded classifiers into hypothesis evaluation. The efficiency of adaptive selection of cascaded classifiers have been also presented. We have shown the improvement of reliability for likelihood calculation by using cascaded classifiers. Experimental results show that the proposed method can smoothly and effectively detect and track walking humans through environments such as dense forests.
https://doi.org/10.5391/IJFIS.2008.8.3.163 인용 PDF KSCI

BAYESIAN CLASSIFICATION AND FREQUENT PATTERN MINING FOR APPLYING INTRUSION DETECTION

Lee, Heon-Gyu;Noh, Ki-Yong;Ryu, Keun-Ho
- 대한원격탐사학회:학술대회논문집
- /
- 대한원격탐사학회 2005년도 Proceedings of ISRS 2005
- /
- pp.713-716
- /
- 2005
In this paper, in order to identify and recognize attack patterns, we propose a Bayesian classification using frequent patterns. In theory, Bayesian classifiers guarantee the minimum error rate compared to all other classifiers. However, in practice this is not always the case owing to inaccuracies in the unrealistic assumption{ class conditional independence) made for its use. Our method addresses the problem of attribute dependence by discovering frequent patterns. It generates frequent patterns using an efficient FP-growth approach. Since the volume of patterns produced can be large, we propose a pruning technique for selection only interesting patterns. Also, this method estimates the probability of a new case using different product approximations, where each product approximation assumes different independence of the attributes. Our experiments show that the proposed classifier achieves higher accuracy and is more efficient than other classifiers.
PDF

A Feature Selection-based Ensemble Method for Arrhythmia Classification

Namsrai, Erdenetuya;Munkhdalai, Tsendsuren;Li, Meijing;Shin, Jung-Hoon;Namsrai, Oyun-Erdene;Ryu, Keun Ho
- Journal of Information Processing Systems
- /
- 제9권1호
- /
- pp.31-40
- /
- 2013
In this paper, a novel method is proposed to build an ensemble of classifiers by using a feature selection schema. The feature selection schema identifies the best feature sets that affect the arrhythmia classification. Firstly, a number of feature subsets are extracted by applying the feature selection schema to the original dataset. Then classification models are built by using the each feature subset. Finally, we combine the classification models by adopting a voting approach to form a classification ensemble. The voting approach in our method involves both classification error rate and feature selection rate to calculate the score of the each classifier in the ensemble. In our method, the feature selection rate depends on the extracting order of the feature subsets. In the experiment, we applied our method to arrhythmia dataset and generated three top disjointed feature sets. We then built three classifiers based on the top-three feature subsets and formed the classifier ensemble by using the voting approach. Our method can improve the classification accuracy in high dimensional dataset. The performance of each classifier and the performance of their ensemble were higher than the performance of the classifier that was based on whole feature space of the dataset. The classification performance was improved and a more stable classification model could be constructed with the proposed approach.
https://doi.org/10.3745/JIPS.2013.9.1.031 인용 PDF KSCI

검색결과 130건 처리시간 0.022초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)