• Title/Summary/Keyword: Feature Set Selection

Search Result 187, Processing Time 0.025 seconds

Exploring Feature Selection Methods for Effective Emotion Mining (효과적 이모션마이닝을 위한 속성선택 방법에 관한 연구)

  • Eo, Kyun Sun;Lee, Kun Chang
    • Journal of Digital Convergence
    • /
    • v.17 no.3
    • /
    • pp.107-117
    • /
    • 2019
  • In the era of SNS, many people relies on it to express their emotions about various kinds of products and services. Therefore, for the companies eagerly seeking to investigate how their products and services are perceived in the market, emotion mining tasks using dataset from SNSs become important much more than ever. Basically, emotion mining is a branch of sentiment analysis which is based on BOW (bag-of-words) and TF-IDF. However, there are few studies on the emotion mining which adopt feature selection (FS) methods to look for optimal set of features ensuring better results. In this sense, this study aims to propose FS methods to conduct emotion mining tasks more effectively with better outcomes. This study uses Twitter and SemEval2007 dataset for the sake of emotion mining experiments. We applied three FS methods such as CFS (Correlation based FS), IG (Information Gain), and ReliefF. Emotion mining results were obtained from applying the selected features to nine classifiers. When applying DT (decision tree) to Tweet dataset, accuracy increases with CFS, IG, and ReliefF methods. When applying LR (logistic regression) to SemEval2007 dataset, accuracy increases with ReliefF method.

Extraction and classification of characteristic information of malicious code for an intelligent detection model (지능적 탐지 모델을 위한 악의적인 코드의 특징 정보 추출 및 분류)

  • Hwang, Yoon-Cheol
    • Journal of Industrial Convergence
    • /
    • v.20 no.5
    • /
    • pp.61-68
    • /
    • 2022
  • In recent years, malicious codes are being produced using the developing information and communication technology, and it is insufficient to detect them with the existing detection system. In order to accurately and efficiently detect and respond to such intelligent malicious code, an intelligent detection model is required, and in order to maximize detection performance, it is important to train with the main characteristic information set of the malicious code. In this paper, we proposed a technique for designing an intelligent detection model and generating the data required for model training as a set of key feature information through transformation, dimensionality reduction, and feature selection steps. And based on this, the main characteristic information was classified by malicious code. In addition, based on the classified characteristic information, we derived common characteristic information that can be used to analyze and detect modified or newly emerging malicious codes. Since the proposed detection model detects malicious codes by learning with a limited number of characteristic information, the detection time and response are fast, so damage can be greatly reduced and Although the performance evaluation result value is slightly different depending on the learning algorithm, it was found through evaluation that most malicious codes can be detected.

Statistical Analysis for Feature Subset Selection Procedures.

  • Kim, In-Young;Lee, Sun-Ho;Kim, Sang-Cheol;Rha, Sun-Young;Chung, Hyun-Cheol;Kim, Byung-Soo
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2003.10a
    • /
    • pp.101-106
    • /
    • 2003
  • In this paper, we propose using Hotelling's T2 statistic for the detection of a set of a set of differentially expressed (DE) genes in colorectal cancer based on its gene expression level in tumor tissues compared with those in normal tissues and to evaluate its predictivity which let us rank genes for the development of biomarkers for population screening of colorectal cancer. We compared the prediction rate based on the DE genes selected by Hotelling's T2 statistic and univariate t statistic using various prediction methods, a regulized discrimination analysis and a support vector machine. The result shows that the prediction rate based on T2 is better than that of univatiate t. This implies that it may not be sufficient to look at each gene in a separate universe and that evaluating combinations of genes reveals interesting information that will not be discovered otherwise.

  • PDF

Improvement of Set Partitioning Sorting Algorithm for Image Compression in Embedded System (임베디드 시스템의 영상압축을 위한 분할정렬 알고리즘의 개선)

  • Kim, Jin-Man;Ju, Dong-Hyun;Kim, Doo-Young
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.6 no.3
    • /
    • pp.107-111
    • /
    • 2005
  • With the increasing use of multimedia technologies, image compression requires higher performance as well as new functionality in the informationized society. Specially, in the specific area of still image encoding in embedded system, a new standard, JPEG2000 that improve various problem of JPEG was developed. This paper proposed a method that reduce quantity of data delivered in EBCOT(Embedded Block Coding with Optimized Truncation) process using SPIHT(Set Partitioning in Hierarchical Trees) Algorithm to optimize selection of threshold from feature of wavelet transform coefficients and to remove sign bit in LL area for the increment of compression efficiency on JPEG2000. The experimental results showed the proposed algorithm achieves more improved bit rate in embedded system.

  • PDF

A Saliency-Based Focusing Region Selection Method for Robust Auto-Focusing

  • Jeon, Jaehwan;Cho, Changhun;Paik, Joonki
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.1 no.3
    • /
    • pp.133-142
    • /
    • 2012
  • This paper presents a salient region detection algorithm for auto-focusing based on the characteristics of a human's visual attention. To describe the saliency at the local, regional, and global levels, this paper proposes a set of novel features including multi-scale local contrast, variance, center-surround entropy, and closeness to the center. Those features are then prioritized to produce a saliency map. The major advantage of the proposed approach is twofold; i) robustness to changes in focus and ii) low computational complexity. The experimental results showed that the proposed method outperforms the existing low-level feature-based methods in the sense of both robustness and accuracy for auto-focusing.

  • PDF

A Study on the Multi-function Processor Unit Implementation for Binary Image Processing (이진영상처리를 위한 다기능 프로세서 장치구현에 관한 연구)

  • 기재조;허윤석;이대영
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.18 no.7
    • /
    • pp.970-979
    • /
    • 1993
  • In this paper, a multi-function processor unit is implemented for binary image processing. This unit consists of a set of address generatior, window pipeline register, look up table, control unit, and two local memories .The merits of multi-function processor unit are more simpler than basic SAP and improved disposal speed. A simple software selection give the various choices of image sizes and it can process the function of smoothing, thinning, feature extraction, and edge detection, selectively or sequentially.

  • PDF

A Study on Feature Division using Sliced Information of STL Format (STL 포맷의 단면정보를 이용한 형상분할에 관한 연구)

  • Ban, Gab-Su
    • Journal of the Korean Society of Industry Convergence
    • /
    • v.5 no.2
    • /
    • pp.141-146
    • /
    • 2002
  • Stereolithography is the best known as rapid prototyping system. It uses the STL format data which is generated from CAD system. In this study, One of the main function of this developed CAM system deals with shape modification which divide a shape into two parts or more. The cross section of a STL part by a z-level is composed with nested or single polygonal closed loop. In order to make RP product, closed loops must fill with triangular facets from SSET and recover sliced triangular facets which is located normal direction to the cross sectional plane. The system is development by using Visuall C++ compiler in the environment of pentium PC. Operating system is Windows NT workstaion from Micro-Soft.

  • PDF

SINE TRIGONOMETRIC SPHERICAL FUZZY AGGREGATION OPERATORS AND THEIR APPLICATION IN DECISION SUPPORT SYSTEM, TOPSIS, VIKOR

  • Qiyas, Muhammad;Abdullah, Saleem
    • Korean Journal of Mathematics
    • /
    • v.29 no.1
    • /
    • pp.137-167
    • /
    • 2021
  • Spherical fuzzy set (SFS) is also one of the fundamental concepts for address more uncertainties in decision problems than the existing structures of fuzzy sets, and thus its implementation was more substantial. The well-known sine trigonometric function maintains the periodicity and symmetry of the origin in nature and thus satisfies the expectations of the experts over the multi parameters. Taking this feature and the significance of the SFSs into the consideration, the main objective of the article is to describe some reliable sine trigonometric laws (ST L) for SFSs. Associated with these laws, we develop new average and geometric aggregation operators to aggregate the Spherical fuzzy numbers (SFNs). Then, we presented a group decision- making (DM) strategy to address the multi-attribute group decision making (MAGDM) problem using the developed aggregation operators. In order to verify the value of the defined operators, a MAGDM strategy is provided along with an application for the selection of laptop. Moreover, a comparative study is also performed to present the effectiveness of the developed approach.

Remaining useful life prediction for PMSM under radial load using particle filter

  • Lee, Younghun;Kim, Inhwan;Choi, Sikgyoung;Oh, Jaewook;Kim, Namsu
    • Smart Structures and Systems
    • /
    • v.29 no.6
    • /
    • pp.799-805
    • /
    • 2022
  • Permanent magnet synchronous motors (PMSMs) are widely used in systems requiring high control precision, efficiency, and reliability. Predicting the remaining useful life (RUL) with health monitoring of PMSMs prevents catastrophic failure and ensures reliable operation of system. In this study, a model-based method for predicting the RUL of PMSMs using phase current and vibration signals is proposed. The proposed method includes feature selection and RUL prediction based on a particle filter with a degradation model. The Paris-Erdogan model describing micro fatigue crack propagation is used as the degradation model. An experimental set-up to conduct accelerated life test, capable of monitoring various signals was designed in this study. Phase current and vibration data obtained from an accelerated life test of the PMSMs were used to verify the proposed approach. Features extracted from the data were clustered based on monotonicity and correlation clustering, respectively. The results identify the effectiveness of using the current data in predicting the RUL of PMSMs.

Optimization of Support Vector Machines for Financial Forecasting (재무예측을 위한 Support Vector Machine의 최적화)

  • Kim, Kyoung-Jae;Ahn, Hyun-Chul
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.4
    • /
    • pp.241-254
    • /
    • 2011
  • Financial time-series forecasting is one of the most important issues because it is essential for the risk management of financial institutions. Therefore, researchers have tried to forecast financial time-series using various data mining techniques such as regression, artificial neural networks, decision trees, k-nearest neighbor etc. Recently, support vector machines (SVMs) are popularly applied to this research area because they have advantages that they don't require huge training data and have low possibility of overfitting. However, a user must determine several design factors by heuristics in order to use SVM. For example, the selection of appropriate kernel function and its parameters and proper feature subset selection are major design factors of SVM. Other than these factors, the proper selection of instance subset may also improve the forecasting performance of SVM by eliminating irrelevant and distorting training instances. Nonetheless, there have been few studies that have applied instance selection to SVM, especially in the domain of stock market prediction. Instance selection tries to choose proper instance subsets from original training data. It may be considered as a method of knowledge refinement and it maintains the instance-base. This study proposes the novel instance selection algorithm for SVMs. The proposed technique in this study uses genetic algorithm (GA) to optimize instance selection process with parameter optimization simultaneously. We call the model as ISVM (SVM with Instance selection) in this study. Experiments on stock market data are implemented using ISVM. In this study, the GA searches for optimal or near-optimal values of kernel parameters and relevant instances for SVMs. This study needs two sets of parameters in chromosomes in GA setting : The codes for kernel parameters and for instance selection. For the controlling parameters of the GA search, the population size is set at 50 organisms and the value of the crossover rate is set at 0.7 while the mutation rate is 0.1. As the stopping condition, 50 generations are permitted. The application data used in this study consists of technical indicators and the direction of change in the daily Korea stock price index (KOSPI). The total number of samples is 2218 trading days. We separate the whole data into three subsets as training, test, hold-out data set. The number of data in each subset is 1056, 581, 581 respectively. This study compares ISVM to several comparative models including logistic regression (logit), backpropagation neural networks (ANN), nearest neighbor (1-NN), conventional SVM (SVM) and SVM with the optimized parameters (PSVM). In especial, PSVM uses optimized kernel parameters by the genetic algorithm. The experimental results show that ISVM outperforms 1-NN by 15.32%, ANN by 6.89%, Logit and SVM by 5.34%, and PSVM by 4.82% for the holdout data. For ISVM, only 556 data from 1056 original training data are used to produce the result. In addition, the two-sample test for proportions is used to examine whether ISVM significantly outperforms other comparative models. The results indicate that ISVM outperforms ANN and 1-NN at the 1% statistical significance level. In addition, ISVM performs better than Logit, SVM and PSVM at the 5% statistical significance level.