• Title/Summary/Keyword: 중복 특징 제거

Search Result 56, Processing Time 0.024 seconds

Prediction of Unsatisfied Customers Using Machine Learning (기계학습을 이용한 불만족 고객의 예측)

  • Oh, Se-Chang;Choi, Min
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2016.04a
    • /
    • pp.667-670
    • /
    • 2016
  • 많은 기계학습 문제에서 특정 선택 문제는 전체적인 성능을 좌우하는 중요한 부분이다. 이는 불만족 고객의 식별 문제와 같이 수 많은 특징을 사용하는 문제에서 더욱 절실하다. 본 연구에서는 중요한 특징을 찾고 중복성을 제거하기 위한 몇 가지 대표적인 방법들을 불만족 고객의 식별 문제에 적용하였다. 이를 통해 먼저 정보 획득량 지표로 의미 있는 특징들을 선별하고, PCA를 사용해서 남아있는 중복성을 줄이는 방법이 가장 좋은 결과를 얻었다.

An Efficient Fan Recognition by Denoising and Principal Component Analysis (잡음제거와 주요성분분석에 의한 효과적인 얼굴인식)

  • Cho Yong-Hyun;Hong Seung-Jun
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2005.11a
    • /
    • pp.546-549
    • /
    • 2005
  • 본 논문에서는 잡음제거와 주요성분분석을 이용한 효과적인 얼굴인식 기법을 제안하였다. 여기서 잡음제거는 필터링과 1차 모멘트 평형이동을 조합하여 영상의 특징정보와 관계가 없는 배경을 제거함을 위한 것이고, 주요성분분석은 얼굴영상의 주요성분인 2차원의 중복성분이 제거된 특징을 효과적으로 추출하기 위함이다. 제안된 기법을 768*576 픽셀 크기를 갖는 24개의 AR얼굴영상을 대상으로 시뮬레이션한 결과, 제안된 얼굴인식이 잡음제거를 하지 않은 기존의 얼굴인식에 비해 주요성분의 개수에 따른 압축성능, 특징추출 시간, 그리고 city-block, Euclidean, negative angle(cosine)의 거리척도에 따른 인식에 있어서 보다 우수한 성능이 있음을 확인할 수 있었다.

  • PDF

Parallel Rabin Fingerprinting on GPGPU for Efficient Data Deduplication (효율적인 데이터 중복제거를 위한 GPGPU 병렬 라빈 핑거프린팅)

  • Ma, Jeonghyeon;Park, Sejin;Park, Chanik
    • Journal of KIISE
    • /
    • v.41 no.9
    • /
    • pp.611-616
    • /
    • 2014
  • Rabin fingerprinting used for chunking requires the largest amount computation time in data deduplication, In this paper, therefore, we proposed parallel Rabin fingerprinting on GPGPU for efficient data deduplication. In addition, for efficient parallelism in Rabin fingerprinting, four issues are considered. Firstly, when dividing input data stream into data sections, we consider the data located near the boundaries between data sections to calculate Rabin fingerprint continuously. Secondly, we consider exploiting the characteristics of Rabin fingerprinting for efficient operation. Thirdly, we consider the chunk boundaries which can be changed compared to sequential Rabin fingerprinting when adapting parallel Rabin fingerprinting. Finally, we consider optimizing GPGPU memory access. Parallel Rabin fingerprinting on GPGPU shows 16 times and 5.3 times better performance compared to sequential Rabin fingerprinting on CPU and compared to parallel Rabin fingerprinting on CPU, respectively. These throughput improvement of Rabin fingerprinting can lead to total performance improvement of data deduplication.

Broadcast Redundancy Reduction Algorithm for Enhanced Wireless Sensor Network Lifetime (무선 센서 네트워크의 수명 향상을 위한 브로드캐스트 중복 제거 알고리즘)

  • Park, Cheol-Min;Kim, Young-Chan
    • Journal of Internet Computing and Services
    • /
    • v.8 no.4
    • /
    • pp.71-79
    • /
    • 2007
  • The communicative behaviors in Wireless Sensor Networks(WSNs) can be characterized by two different types: routing and broadcasting. The broadcasting is used for effective route discoveries and packet delivery. However, broadcasting shorten the network lifetime due to the energy overconsumption by redundant transmissions. In this paper, we proposed a algorithm that remove redundant forward nodes based on Dominant Pruning method using 2-hop neighbors knowledge. Simulation results show that the proposed algorithm appears superior performance in respect of the number of forward nodes and the network lifetime.

  • PDF

Survival network based Android Authorship Attribution considering overlapping tolerance (중복 허용 범위를 고려한 서바이벌 네트워크 기반 안드로이드 저자 식별)

  • Hwang, Cheol-hun;Shin, Gun-Yoon;Kim, Dong-Wook;Han, Myung-Mook
    • Journal of Internet Computing and Services
    • /
    • v.21 no.6
    • /
    • pp.13-21
    • /
    • 2020
  • The Android author identification study can be interpreted as a method for revealing the source in a narrow range, but if viewed in a wide range, it can be interpreted as a study to gain insight to identify similar works through known works. The problem found in the Android author identification study is that it is an important code on the Android system, but it is difficult to find the important feature of the author due to the meaningless codes. Due to this, legitimate codes or behaviors were also incorrectly defined as malicious codes. To solve this, we introduced the concept of survival network to solve the problem by removing the features found in various Android apps and surviving unique features defined by authors. We conducted an experiment comparing the proposed framework with a previous study. From the results of experiments on 440 authors' identified apps, we obtained a classification accuracy of up to 92.10%, and showed a difference of up to 3.47% from the previous study. It used a small amount of learning data, but because it used unique features without duplicate features for each author, it was considered that there was a difference from previous studies. In addition, even in comparative experiments with previous studies according to the feature definition method, the same accuracy can be shown with a small number of features, and this can be seen that continuously overlapping meaningless features can be managed through the concept of a survival network.

Removing Non-informative Features by Robust Feature Wrapping Method for Microarray Gene Expression Data (유전자 알고리즘과 Feature Wrapping을 통한 마이크로어레이 데이타 중복 특징 소거법)

  • Lee, Jae-Sung;Kim, Dae-Won
    • Journal of KIISE:Software and Applications
    • /
    • v.35 no.8
    • /
    • pp.463-478
    • /
    • 2008
  • Due to the high dimensional problem, typically machine learning algorithms have relied on feature selection techniques in order to perform effective classification in microarray gene expression datasets. However, the large number of features compared to the number of samples makes the task of feature selection computationally inprohibitive and prone to errors. One of traditional feature selection approach was feature filtering; measuring one gene per one step. Then feature filtering was an univariate approach that cannot validate multivariate correlations. In this paper, we proposed a function for measuring both class separability and correlations. With this approach, we solved the problem related to feature filtering approach.

Classification of Epilepsy Using Distance-Based Feature Selection (거리 기반의 특징 선택을 이용한 간질 분류)

  • Lee, Sang-Hong
    • Journal of Digital Convergence
    • /
    • v.12 no.8
    • /
    • pp.321-327
    • /
    • 2014
  • Feature selection is the technique to improve the classification performance by using a minimal set by removing features that are not related with each other and characterized by redundancy. This study proposed new feature selection using the distance between the center of gravity of the bounded sum of weighted fuzzy membership functions (BSWFMs) provided by the neural network with weighted fuzzy membership functions (NEWFM) in order to improve the classification performance. The distance-based feature selection selects the minimum features by removing the worst features with the shortest distance between the center of gravity of BSWFMs from the 24 initial features one by one, and then 22 minimum features are selected with the highest performance result. The proposed methodology shows that sensitivity, specificity, and accuracy are 97.7%, 99.7%, and 98.7% with 22 minimum features, respectively.

Topic-Based Multi-Document Summarization using Semantic Features of Documents (문서의 의미특징을 이용한 주제 기반의 다중문서 요약)

  • Park, Sun;An, Dong Un;Kim, Chul-Won
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2009.11a
    • /
    • pp.715-716
    • /
    • 2009
  • 인터넷의 발전은 대량의 정보를 양산하였고, 이러한 대량의 정보 집합 내에서는 비슷한 정보가 재활용 되거나 반복되는 정보중복문제를 가지고 있다. 중복되는 정보들로부터 사용자에게 원하는 정보를 신속히 검색할 수 있도록 하는 정보 요약에 대한 필요성은 점차 증가하고 있다. 본 논문은 비음수 행렬 인수분해(NMF, non-negative matrix factorization)에 의한 문서의 의미특징을 이용하여 주제기반의 다중문서를 요약하는 새로운 방법을 제안한다. 본 논문에서는 다중문서가 포함하고 있는 문서들 간의 고유구조를 문서요약에 이용하여서 요약의 질을 높일 수 있고, 주제와 문장 간의 유사성과 다양성 고려하여서 쉽게 과잉정보를 제거하여 문장을 요약할 수 있는 장점을 갖는다.

An Enhanced Feature Selection Method Based on the Impurity of Words Considering Unbalanced Distribution of Documents (문서의 불균등 분포를 고려한 단어 불순도 기반 특징 선택 방법)

  • Kang, Jin-Beom;Yang, Jae-Young;Choi, Joong-Min
    • Journal of KIISE:Software and Applications
    • /
    • v.34 no.9
    • /
    • pp.804-816
    • /
    • 2007
  • Sample training data for machine learning often contain irrelevant information or redundant concept. It is also the case that the original data may include noise. If the information collected for constructing learning model is not reliable, it is difficult to obtain accurate information. So the system attempts to find relations or regulations between features and categories in the teaming phase. The feature selection is to remove irrelevant or redundant information before constructing teaming model. for improving its performance. Existing feature selection methods assume that the distribution of documents is balanced in terms of the number of documents for each class and the length of each document. In practice, however, it is difficult not only to prepare a set of documents with almost equal length, but also to define a number of classes with fixed number of document elements. In this paper, we propose a new feature selection method that considers the impurities among the words and unbalanced distribution of documents in categories. We could obtain feature candidates using the word impurity and eventually select the features through unbalanced distribution of documents. We demonstrate that our method performs better than other existing methods via some experiments.

Texture-Spatial Separation based Feature Distillation Network for Single Image Super Resolution (단일 영상 초해상도를 위한 질감-공간 분리 기반의 특징 분류 네트워크)

  • Hyun Ho Han
    • Journal of Digital Policy
    • /
    • v.2 no.3
    • /
    • pp.1-7
    • /
    • 2023
  • In this paper, I proposes a method for performing single image super resolution by separating texture-spatial domains and then classifying features based on detailed information. In CNN (Convolutional Neural Network) based super resolution, the complex procedures and generation of redundant feature information in feature estimation process for enhancing details can lead to quality degradation in super resolution. The proposed method reduced procedural complexity and minimizes generation of redundant feature information by splitting input image into two channels: texture and spatial. In texture channel, a feature refinement process with step-wise skip connections is applied for detail restoration, while in spatial channel, a method is introduced to preserve the structural features of the image. Experimental results using proposed method demonstrate improved performance in terms of PSNR and SSIM evaluations compared to existing super resolution methods, confirmed the enhancement in quality.