• Title/Summary/Keyword: Classification Problem

Search Result 1,729, Processing Time 0.066 seconds

A Study on the Relationship between Class Similarity and the Performance of Hierarchical Classification Method in a Text Document Classification Problem (텍스트 문서 분류에서 범주간 유사도와 계층적 분류 방법의 성과 관계 연구)

  • Jang, Soojung;Min, Daiki
    • The Journal of Society for e-Business Studies
    • /
    • v.25 no.3
    • /
    • pp.77-93
    • /
    • 2020
  • The literature has reported that hierarchical classification methods generally outperform the flat classification methods for a multi-class document classification problem. Unlike the literature that has constructed a class hierarchy, this paper evaluates the performance of hierarchical and flat classification methods under a situation where the class hierarchy is predefined. We conducted numerical evaluations for two data sets; research papers on climate change adaptation technologies in water sector and 20NewsGroup open data set. The evaluation results show that the hierarchical classification method outperforms the flat classification methods under a certain condition, which differs from the literature. The performance of hierarchical classification method over flat classification method depends on class similarities at levels in the class structure. More importantly, the hierarchical classification method works better when the upper level similarity is less that the lower level similarity.

Intention Classification for Retrieval of Health Questions

  • Liu, Rey-Long
    • International Journal of Knowledge Content Development & Technology
    • /
    • v.7 no.1
    • /
    • pp.101-120
    • /
    • 2017
  • Healthcare professionals have edited many health questions (HQs) and their answers for healthcare consumers on the Internet. The HQs provide both readable and reliable health information, and hence retrieval of those HQs that are relevant to a given question is essential for health education and promotion through the Internet. However, retrieval of relevant HQs needs to be based on the recognition of the intention of each HQ, which is difficult to be done by predefining syntactic and semantic rules. We thus model the intention recognition problem as a text classification problem, and develop two techniques to improve a learning-based text classifier for the problem. The two techniques improve the classifier by location-based and area-based feature weightings, respectively. Experimental results show that, the two techniques can work together to significantly improve a Support Vector Machine classifier in both the recognition of HQ intentions and the retrieval of relevant HQs.

Classification Analysis for Unbalanced Data (불균형 자료에 대한 분류분석)

  • Kim, Dongah;Kang, Suyeon;Song, Jongwoo
    • The Korean Journal of Applied Statistics
    • /
    • v.28 no.3
    • /
    • pp.495-509
    • /
    • 2015
  • We study a classification problem of significant differences in the proportion of two groups known as the unbalanced classification problem. It is usually more difficult to classify classes accurately in unbalanced data than balanced data. Most observations are likely to be classified to the bigger group if we apply classification methods to the unbalanced data because it can minimize the misclassification loss. However, this smaller group is misclassified as the larger group problem that can cause a bigger loss in most real applications. We compare several classification methods for the unbalanced data using sampling techniques (up and down sampling). We also check the total loss of different classification methods when the asymmetric loss is applied to simulated and real data. We use the misclassification rate, G-mean, ROC and AUC (area under the curve) for the performance comparison.

A Fuzzy-Rough Classification Method to Minimize the Coupling Problem of Rules (규칙의 커플링문제를 최소화하기 위한 퍼지-러프 분류방법)

  • Son, Chang-S.;Chung, Hwan-M.;Seo, Suk-T.;Kwon, Soon-H.
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.17 no.4
    • /
    • pp.460-465
    • /
    • 2007
  • In this paper, we propose a novel pattern classification method based on statistical properties of the given data and fuzzy-rough set to minimize the coupling problem of the rules. In the proposed method, statistical properties is used by a selection criteria for deciding a partition number of antecedent fuzzy sets, and for minimizing an coupling problem of the generated rules. Moreover, rough set is used as a tool to remove unnecessary attributes between generated rules from the numerical data. In order to verify the validity of the proposed method, we compared the classification results (i.e, classification precision) of the proposed with the conventional pattern classification methods on the Fisher's IRIS data. From experiment results, we can conclude that the proposed method shows relatively better performance than those of the classification methods based on the conventional approaches.

Sparse kernel classication using IRWLS procedure

  • Kim, Dae-Hak
    • Journal of the Korean Data and Information Science Society
    • /
    • v.20 no.4
    • /
    • pp.749-755
    • /
    • 2009
  • Support vector classification (SVC) provides more complete description of the lin-ear and nonlinear relationships between input vectors and classifiers. In this paper. we propose the sparse kernel classifier to solve the optimization problem of classification with a modified hinge loss function and absolute loss function, which provides the efficient computation and the sparsity. We also introduce the generalized cross validation function to select the hyper-parameters which affects the classification performance of the proposed method. Experimental results are then presented which illustrate the performance of the proposed procedure for classification.

  • PDF

An Application of the Balanced Quadratic Classification Rule on the Discriminant Analysis in Growth Curve Model (성장곡선모형의 판별분석에서 균형이차분류법의 적용)

  • Shim, Kyu-Bark
    • Journal of Korean Society for Quality Management
    • /
    • v.23 no.2
    • /
    • pp.53-67
    • /
    • 1995
  • The problem considered here is to find the optimal discriminant analysis method in growth curve model. It has been studied how to find correct prior probability for the effective classification in discriminant analysis. We use the balanced condition to calculate prior probability. From the informative simulation study, new classification rule for the growth curve model is suggested. The suggested classification rule has better classification result than the other previously suggested method in terms of error rate criterion.

  • PDF

A Note on Linear SVM in Gaussian Classes

  • Jeon, Yongho
    • Communications for Statistical Applications and Methods
    • /
    • v.20 no.3
    • /
    • pp.225-233
    • /
    • 2013
  • The linear support vector machine(SVM) is motivated by the maximal margin separating hyperplane and is a popular tool for binary classification tasks. Many studies exist on the consistency properties of SVM; however, it is unknown whether the linear SVM is consistent for estimating the optimal classification boundary even in the simple case of two Gaussian classes with a common covariance, where the optimal classification boundary is linear. In this paper we show that the linear SVM can be inconsistent in the univariate Gaussian classification problem with a common variance, even when the best tuning parameter is used.

On Optimizing Dissimilarity-Based Classifier Using Multi-level Fusion Strategies (다단계 퓨전기법을 이용한 비유사도 기반 식별기의 최적화)

  • Kim, Sang-Woon;Duin, Robert P. W.
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.45 no.5
    • /
    • pp.15-24
    • /
    • 2008
  • For high-dimensional classification tasks, such as face recognition, the number of samples is smaller than the dimensionality of the samples. In such cases, a problem encountered in linear discriminant analysis-based methods for dimension reduction is what is known as the small sample size (SSS) problem. Recently, to solve the SSS problem, a way of employing a dissimilarity-based classification(DBC) has been investigated. In DBC, an object is represented based on the dissimilarity measures among representatives extracted from training samples instead of the feature vector itself. In this paper, we propose a new method of optimizing DBCs using multi-level fusion strategies(MFS), in which fusion strategies are employed to represent features as well as to design classifiers. Our experimental results for benchmark face databases demonstrate that the proposed scheme achieves further improved classification accuracies.

A New Hybrid Algorithm for Invariance and Improved Classification Performance in Image Recognition

  • Shi, Rui-Xia;Jeong, Dong-Gyu
    • International journal of advanced smart convergence
    • /
    • v.9 no.3
    • /
    • pp.85-96
    • /
    • 2020
  • It is important to extract salient object image and to solve the invariance problem for image recognition. In this paper we propose a new hybrid algorithm for invariance and improved classification performance in image recognition, whose algorithm is combined by FT(Frequency-tuned Salient Region Detection) algorithm, Guided filter, Zernike moments, and a simple artificial neural network (Multi-layer Perceptron). The conventional FT algorithm is used to extract initial salient object image, the guided filtering to preserve edge details, Zernike moments to solve invariance problem, and a classification to recognize the extracted image. For guided filtering, guided filter is used, and Multi-layer Perceptron which is a simple artificial neural networks is introduced for classification. Experimental results show that this algorithm can achieve a superior performance in the process of extracting salient object image and invariant moment feature. And the results show that the algorithm can also classifies the extracted object image with improved recognition rate.