• Title/Summary/Keyword: Local feature selection

Search Result 59, Processing Time 0.038 seconds

Improving of kNN-based Korean text classifier by using heuristic information (경험적 정보를 이용한 kNN 기반 한국어 문서 분류기의 개선)

  • Lim, Heui-Seok;Nam, Kichun
    • The Journal of Korean Association of Computer Education
    • /
    • v.5 no.3
    • /
    • pp.37-44
    • /
    • 2002
  • Automatic text classification is a task of assigning predefined categories to free text documents. Its importance is increased to organize and manage a huge amount of text data. There have been some researches on automatic text classification based on machine learning techniques. While most of them was focused on proposal of a new machine learning methods and cross evaluation between other systems, a through evaluation or optimization of a method has been rarely been done. In this paper, we propose an improving method of kNN-based Korean text classification system using heuristic informations about decision function, the number of nearest neighbor, and feature selection method. Experimental results showed that the system with similarity-weighted decision function, global method in considering neighbors, and DF/ICF feature selection was more accurate than simple kNN-based classifier. Also, we found out that the performance of the local method with well chosen k value was as high as that of the global method with much computational costs.

  • PDF

Deriving Local Association Rules by User Segmentation (사용자 구분에 의한 지역적 연관규칙의 유도)

  • Park, Se-Il;Lee, Soo-Wun
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.1_2
    • /
    • pp.53-64
    • /
    • 2002
  • Association rule discovery is a method that detects associative relationships between items or attributes in transactions. It is one of the most widely studied problems in data mining because it offers useful insight into the types of dependencies that exist in a data set. However, most studies on association rule discovery have the drawback that they can not discover association rules among user groups that have common characteristics. To solve this problem, we segment the set of users into user-subgroups by using feature selection and the user segmentation, thus local association rules in user-subgroup can be discovered. To evaluate that the local association rules are more appropriated than the global association rules in each user-subgroup, derived local association rules are compared with global association rules in terms of several evaluation measures.

Local Region Spectral Analysis for Performance Enhancement of Dementia Classification (인지증 판별 성능 향상을 위한 스펙트럼 국부 영역 분석 방법)

  • Park, Jun-Qyu;Baek, Seong-Joon
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.12 no.11
    • /
    • pp.5150-5155
    • /
    • 2011
  • Alzheimer's disease (AD) and vascular dementia (VD) are the most common dementia. In this paper, we proposed a region selection for classification of AD, VD and normal (NOR) based on micro-Raman spectra from platelet. The preprocessing step is a smoothing followed by background elimination to the original spectra. Then we applied the minmax method for normalization. After the inspection of the preprocessed spectra, we found that 725-777, 1504-1592 and 1632-1700 $cm^{-1}$ regions are the most discriminative features in AD, VD and NOR spectra. We applied the feature transformation using PCA (principal component analysis) and NMF (nonnegative matrix factorization). The classification result of MAP(maximum a posteriori probability) involving 327 spectra transformed features using proposed local region showed about 92.8 % true classification average rate.

Adaptive Cooperative Spectrum Sensing Based on SNR Estimation in Cognitive Radio Networks

  • Ni, Shuiping;Chang, Huigang;Xu, Yuping
    • Journal of Information Processing Systems
    • /
    • v.15 no.3
    • /
    • pp.604-615
    • /
    • 2019
  • Single-user spectrum sensing is susceptible to multipath effects, shadow effects, hidden terminals and other unfavorable factors, leading to misjudgment of perceived results. In order to increase the detection accuracy and reduce spectrum sensing cost, we propose an adaptive cooperative sensing strategy based on an estimated signal-to-noise ratio (SNR). Which can adaptive select different sensing strategy during the local sensing phase. When the estimated SNR is higher than the selection threshold, adaptive double threshold energy detector (ED) is implemented, otherwise cyclostationary feature detector is performed. Due to the fact that only a better sensing strategy is implemented in a period, the detection accuracy is improved under the condition of low SNR with low complexity. The local sensing node transmits the perceived results through the control channel to the fusion center (FC), and uses voting rule to make the hard decision. Thus the transmission bandwidth is effectively saved. Simulation results show that the proposed scheme can effectively improve the system detection probability, shorten the average sensing time, and has better robustness without largely increasing the costs of sensing system.

A Study on the Multi-function Processor Unit Implementation for Binary Image Processing (이진영상처리를 위한 다기능 프로세서 장치구현에 관한 연구)

  • 기재조;허윤석;이대영
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.18 no.7
    • /
    • pp.970-979
    • /
    • 1993
  • In this paper, a multi-function processor unit is implemented for binary image processing. This unit consists of a set of address generatior, window pipeline register, look up table, control unit, and two local memories .The merits of multi-function processor unit are more simpler than basic SAP and improved disposal speed. A simple software selection give the various choices of image sizes and it can process the function of smoothing, thinning, feature extraction, and edge detection, selectively or sequentially.

  • PDF

Extraction of Geometric Primitives from Point Cloud Data

  • Kim, Sung-Il;Ahn, Sung-Joon
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2005.06a
    • /
    • pp.2010-2014
    • /
    • 2005
  • Object detection and parameter estimation in point cloud data is a relevant subject to robotics, reverse engineering, computer vision, and sport mechanics. In this paper a software is presented for fully-automatic object detection and parameter estimation in unordered, incomplete and error-contaminated point cloud with a large number of data points. The software consists of three algorithmic modules each for object identification, point segmentation, and model fitting. The newly developed algorithms for orthogonal distance fitting (ODF) play a fundamental role in each of the three modules. The ODF algorithms estimate the model parameters by minimizing the square sum of the shortest distances between the model feature and the measurement points. Curvature analysis of the local quadric surfaces fitted to small patches of point cloud provides the necessary seed information for automatic model selection, point segmentation, and model fitting. The performance of the software on a variety of point cloud data will be demonstrated live.

  • PDF

Rotation-Invariant Iris Recognition Method Based on Zernike Moments (Zernike 모멘트 기반의 회전 불변 홍채 인식)

  • Choi, Chang-Soo;Seo, Jeong-Man;Jun, Byoung-Min
    • Journal of the Korea Society of Computer and Information
    • /
    • v.17 no.2
    • /
    • pp.31-40
    • /
    • 2012
  • Iris recognition is a biometric technology which can identify a person using the iris pattern. It is important for the iris recognition system to extract the feature which is invariant to changes in iris patterns. Those changes can be occurred by the influence of lights, changes in the size of the pupil, and head tilting. In this paper, we propose a novel method based on Zernike Moment which is robust to rotations of iris patterns. we utilized a selection of Zernike moments for the fast and effective recognition by selecting global optimum moments and local optimum moments for optimal matching of each iris class. The proposed method enables high-speed feature extraction and feature comparison because it requires no additional processing to obtain the rotation invariance, and shows comparable performance to the well-known previous methods.

Optimization of Multiclass Support Vector Machine using Genetic Algorithm: Application to the Prediction of Corporate Credit Rating (유전자 알고리즘을 이용한 다분류 SVM의 최적화: 기업신용등급 예측에의 응용)

  • Ahn, Hyunchul
    • Information Systems Review
    • /
    • v.16 no.3
    • /
    • pp.161-177
    • /
    • 2014
  • Corporate credit rating assessment consists of complicated processes in which various factors describing a company are taken into consideration. Such assessment is known to be very expensive since domain experts should be employed to assess the ratings. As a result, the data-driven corporate credit rating prediction using statistical and artificial intelligence (AI) techniques has received considerable attention from researchers and practitioners. In particular, statistical methods such as multiple discriminant analysis (MDA) and multinomial logistic regression analysis (MLOGIT), and AI methods including case-based reasoning (CBR), artificial neural network (ANN), and multiclass support vector machine (MSVM) have been applied to corporate credit rating.2) Among them, MSVM has recently become popular because of its robustness and high prediction accuracy. In this study, we propose a novel optimized MSVM model, and appy it to corporate credit rating prediction in order to enhance the accuracy. Our model, named 'GAMSVM (Genetic Algorithm-optimized Multiclass Support Vector Machine),' is designed to simultaneously optimize the kernel parameters and the feature subset selection. Prior studies like Lorena and de Carvalho (2008), and Chatterjee (2013) show that proper kernel parameters may improve the performance of MSVMs. Also, the results from the studies such as Shieh and Yang (2008) and Chatterjee (2013) imply that appropriate feature selection may lead to higher prediction accuracy. Based on these prior studies, we propose to apply GAMSVM to corporate credit rating prediction. As a tool for optimizing the kernel parameters and the feature subset selection, we suggest genetic algorithm (GA). GA is known as an efficient and effective search method that attempts to simulate the biological evolution phenomenon. By applying genetic operations such as selection, crossover, and mutation, it is designed to gradually improve the search results. Especially, mutation operator prevents GA from falling into the local optima, thus we can find the globally optimal or near-optimal solution using it. GA has popularly been applied to search optimal parameters or feature subset selections of AI techniques including MSVM. With these reasons, we also adopt GA as an optimization tool. To empirically validate the usefulness of GAMSVM, we applied it to a real-world case of credit rating in Korea. Our application is in bond rating, which is the most frequently studied area of credit rating for specific debt issues or other financial obligations. The experimental dataset was collected from a large credit rating company in South Korea. It contained 39 financial ratios of 1,295 companies in the manufacturing industry, and their credit ratings. Using various statistical methods including the one-way ANOVA and the stepwise MDA, we selected 14 financial ratios as the candidate independent variables. The dependent variable, i.e. credit rating, was labeled as four classes: 1(A1); 2(A2); 3(A3); 4(B and C). 80 percent of total data for each class was used for training, and remaining 20 percent was used for validation. And, to overcome small sample size, we applied five-fold cross validation to our dataset. In order to examine the competitiveness of the proposed model, we also experimented several comparative models including MDA, MLOGIT, CBR, ANN and MSVM. In case of MSVM, we adopted One-Against-One (OAO) and DAGSVM (Directed Acyclic Graph SVM) approaches because they are known to be the most accurate approaches among various MSVM approaches. GAMSVM was implemented using LIBSVM-an open-source software, and Evolver 5.5-a commercial software enables GA. Other comparative models were experimented using various statistical and AI packages such as SPSS for Windows, Neuroshell, and Microsoft Excel VBA (Visual Basic for Applications). Experimental results showed that the proposed model-GAMSVM-outperformed all the competitive models. In addition, the model was found to use less independent variables, but to show higher accuracy. In our experiments, five variables such as X7 (total debt), X9 (sales per employee), X13 (years after founded), X15 (accumulated earning to total asset), and X39 (the index related to the cash flows from operating activity) were found to be the most important factors in predicting the corporate credit ratings. However, the values of the finally selected kernel parameters were found to be almost same among the data subsets. To examine whether the predictive performance of GAMSVM was significantly greater than those of other models, we used the McNemar test. As a result, we found that GAMSVM was better than MDA, MLOGIT, CBR, and ANN at the 1% significance level, and better than OAO and DAGSVM at the 5% significance level.

Segmented Douglas-Peucker Algorithm Based on the Node Importance

  • Wang, Xiaofei;Yang, Wei;Liu, Yan;Sun, Rui;Hu, Jun;Yang, Longcheng;Hou, Boyang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.4
    • /
    • pp.1562-1578
    • /
    • 2020
  • Vector data compression algorithm can meet requirements of different levels and scales by reducing the data amount of vector graphics, so as to reduce the transmission, processing time and storage overhead of data. In view of the fact that large threshold leading to comparatively large error in Douglas-Peucker vector data compression algorithm, which has difficulty in maintaining the uncertainty of shape features and threshold selection, a segmented Douglas-Peucker algorithm based on node importance is proposed. Firstly, the algorithm uses the vertical chord ratio as the main feature to detect and extract the critical points with large contribution to the shape of the curve, so as to ensure its basic shape. Then, combined with the radial distance constraint, it selects the maximum point as the critical point, and introduces the threshold related to the scale to merge and adjust the critical points, so as to realize local feature extraction between two critical points to meet the requirements in accuracy. Finally, through a large number of different vector data sets, the improved algorithm is analyzed and evaluated from qualitative and quantitative aspects. Experimental results indicate that the improved vector data compression algorithm is better than Douglas-Peucker algorithm in shape retention, compression error, results simplification and time efficiency.

Automatic Object Recognition in 3D Measuring Data (3차원 측정점으로부터의 객체 자동인식)

  • Ahn, Sung-Joon
    • The KIPS Transactions:PartB
    • /
    • v.16B no.1
    • /
    • pp.47-54
    • /
    • 2009
  • Automatic object recognition in 3D measuring data is of great interest in many application fields e.g. computer vision, reverse engineering and digital factory. In this paper we present a software tool for a fully automatic object detection and parameter estimation in unordered and noisy point clouds with a large number of data points. The software consists of three interactive modules each for model selection, point segmentation and model fitting, in which the orthogonal distance fitting (ODF) plays an important role. The ODF algorithms estimate model parameters by minimizing the square sum of the shortest distances between model feature and measurement points. The local quadric surface fitted through ODF to a randomly touched small initial patch of the point cloud provides the necessary initial information for the overall procedures of model selection, point segmentation and model fitting. The performance of the presented software tool will be demonstrated by applying to point clouds.