• 제목/요약/키워드: data classification

검색결과 8,054건 처리시간 0.038초

Enhancing Gene Expression Classification of Support Vector Machines with Generative Adversarial Networks

  • Huynh, Phuoc-Hai;Nguyen, Van Hoa;Do, Thanh-Nghi
    • Journal of information and communication convergence engineering
    • /
    • 제17권1호
    • /
    • pp.14-20
    • /
    • 2019
  • Currently, microarray gene expression data take advantage of the sufficient classification of cancers, which addresses the problems relating to cancer causes and treatment regimens. However, the sample size of gene expression data is often restricted, because the price of microarray technology on studies in humans is high. We propose enhancing the gene expression classification of support vector machines with generative adversarial networks (GAN-SVMs). A GAN that generates new data from original training datasets was implemented. The GAN was used in conjunction with nonlinear SVMs that efficiently classify gene expression data. Numerical test results on 20 low-sample-size and very high-dimensional microarray gene expression datasets from the Kent Ridge Biomedical and Array Expression repositories indicate that the model is more accurate than state-of-the-art classifying models.

텍스트 분류 기법의 발전 (Enhancement of Text Classification Method)

  • 신광성;신성윤
    • 한국정보통신학회:학술대회논문집
    • /
    • 한국정보통신학회 2019년도 춘계학술대회
    • /
    • pp.155-156
    • /
    • 2019
  • Classification and Regression Tree (CART), SVM (Support Vector Machine) 및 k-nearest neighbor classification (kNN)과 같은 기존 기계 학습 기반 감정 분석 방법은 정확성이 떨어졌습니다. 본 논문에서는 개선 된 kNN 분류 방법을 제안한다. 개선 된 방법 및 데이터 정규화를 통해 정확성 향상의 목적이 달성됩니다. 그 후, 3 가지 분류 알고리즘과 개선 된 알고리즘을 실험 데이터에 기초하여 비교 하였다.

  • PDF

One-dimensional CNN Model of Network Traffic Classification based on Transfer Learning

  • Lingyun Yang;Yuning Dong;Zaijian Wang;Feifei Gao
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제18권2호
    • /
    • pp.420-437
    • /
    • 2024
  • There are some problems in network traffic classification (NTC), such as complicated statistical features and insufficient training samples, which may cause poor classification effect. A NTC architecture based on one-dimensional Convolutional Neural Network (CNN) and transfer learning is proposed to tackle these problems and improve the fine-grained classification performance. The key points of the proposed architecture include: (1) Model classification--by extracting normalized rate feature set from original data, plus existing statistical features to optimize the CNN NTC model. (2) To apply transfer learning in the classification to improve NTC performance. We collect two typical network flows data from Youku and YouTube, and verify the proposed method through extensive experiments. The results show that compared with existing methods, our method could improve the classification accuracy by around 3-5%for Youku, and by about 7 to 27% for YouTube.

단일 클래스 분류기법을 이용한 반도체 공정 주기 신호의 이상분류 (One-class Classification based Fault Classification for Semiconductor Process Cyclic Signal)

  • 조민영;백준걸
    • 산업공학
    • /
    • 제25권2호
    • /
    • pp.170-177
    • /
    • 2012
  • Process control is essential to operate the semiconductor process efficiently. This paper consider fault classification of semiconductor based cyclic signal for process control. In general, process signal usually take the different pattern depending on some different cause of fault. If faults can be classified by cause of faults, it could improve the process control through a definite and rapid diagnosis. One of the most important thing is a finding definite diagnosis in fault classification, even-though it is classified several times. This paper proposes the method that one-class classifier classify fault causes as each classes. Hotelling T2 chart, kNNDD(k-Nearest Neighbor Data Description), Distance based Novelty Detection are used to perform the one-class classifier. PCA(Principal Component Analysis) is also used to reduce the data dimension because the length of process signal is too long generally. In experiment, it generates the data based real signal patterns from semiconductor process. The objective of this experiment is to compare between the proposed method and SVM(Support Vector Machine). Most of the experiments' results show that proposed method using Distance based Novelty Detection has a good performance in classification and diagnosis problems.

실적공사비 적산방식 도입을 위한 조경공사 공종분류체계에 관한 연구 -주택단지 조경공사를 중심으로- (A Study of Landscape Construction Work Classification for System Instruction of New Estimation System based on Historical Construction data. - With regard to Housing Landscape Construction -)

  • 박원규;김두하;안동만
    • 한국조경학회지
    • /
    • 제25권1호
    • /
    • pp.82-99
    • /
    • 1997
  • The purpose of this study is to establish work classification system of landscape construction in order to offer the basis of new estimation system of public landscape construction. New estimation system is based on historical construction data. For application of this system, the standard work classification system is necessary. Because extensive cost data should be accumulated under an unified construction work classification system. In the study of new estimation system carried by KICT(Korea Institute of Construction Technology), landscaping works belong to earth work of civil engineering. It looks very unreasonable work classification, because landscape archtecture has its own specialties and professional domain. In this study, information classification systems in the construction industry and various landscaping works of housing developments are analysed. As a result. a standard work classification system of housing landscape construction is proposed in section VI-3. This standard work classification structure consists of three levels divisions (i.e large work division, middle work division, small work division) . Now in this study, housing landscape construction works are divided into four large works and twenty six middle works. According to work attributes, middle and small work division is possible to subdivide into details.

  • PDF

Construction of Customer Appeal Classification Model Based on Speech Recognition

  • Sheng Cao;Yaling Zhang;Shengping Yan;Xiaoxuan Qi;Yuling Li
    • Journal of Information Processing Systems
    • /
    • 제19권2호
    • /
    • pp.258-266
    • /
    • 2023
  • Aiming at the problems of poor customer satisfaction and poor accuracy of customer classification, this paper proposes a customer classification model based on speech recognition. First, this paper analyzes the temporal data characteristics of customer demand data, identifies the influencing factors of customer demand behavior, and determines the process of feature extraction of customer voice signals. Then, the emotional association rules of customer demands are designed, and the classification model of customer demands is constructed through cluster analysis. Next, the Euclidean distance method is used to preprocess customer behavior data. The fuzzy clustering characteristics of customer demands are obtained by the fuzzy clustering method. Finally, on the basis of naive Bayesian algorithm, a customer demand classification model based on speech recognition is completed. Experimental results show that the proposed method improves the accuracy of the customer demand classification to more than 80%, and improves customer satisfaction to more than 90%. It solves the problems of poor customer satisfaction and low customer classification accuracy of the existing classification methods, which have practical application value.

맵리듀스를 이용한 통계적 접근의 감성 분류 (Statistical Approach to Sentiment Classification using MapReduce)

  • 강문수;백승희;최영식
    • 감성과학
    • /
    • 제15권4호
    • /
    • pp.425-440
    • /
    • 2012
  • 인터넷의 규모가 커지면서 주관적인 데이터가 증가하였다. 이에 주관적인 데이터를 자동으로 분류할 필요가 생겼다. 감성 분류는 데이터를 여러 감성 종류에 따라 나누는 것을 말한다. 감성 분류 연구는 크게 자연어 처리와 감성어 사전 구축을 중심으로 이루어져 왔다. 이전의 감성 분류 연구는 자연어 처리 과정에서 형태소 분석이 제대로 이루어지지 않는 문제와 감성어 사전구축 시 등록할 단어를 선별하고 단어의 감성 정도를 정하는 데에 명확한 기준을 정하기 힘든 문제가 있다. 이러한 어려움을 해결하기 위하여 감성 분류에 대용량 데이터와 통계적 접근의 조합을 제안한다. 본 논문에서 제안하는 방법은 단어의 의미를 찾는 대신 수많은 데이터에서 등장하는 표현들의 통계치를 이용하여 감성 판단을 하는 것이다. 이러한 접근은 자연어 처리 알고리즘에 의존하던 이전 연구와 달리 데이터에 집중한다. 대용량 데이터 처리를 위해 하둡과 맵리듀스를 이용한다.

  • PDF

데이터 마이닝의 분류화와 연관 규칙을 이용한 네트워크 트래픽 분석 (Analysis of Network Traffic using Classification and Association Rule)

  • 이창언;김응모
    • 한국시뮬레이션학회논문지
    • /
    • 제11권4호
    • /
    • pp.15-23
    • /
    • 2002
  • As recently the network environment and application services have been more complex and diverse, there has. In this paper we introduce a scheme the extract useful information for network management by analyzing traffic data in user login file. For this purpose we use classification and association rule based on episode concept in data mining. Since login data has inherently time series characterization, convertible data mining algorithms cannot directly applied. We generate virtual transaction, classify transactions above threshold value in time window, and simulate the classification algorithm.

  • PDF

다변량 크리깅과 KOMPSAT-2 영상을 이용한 간석지 표층 퇴적물 분류 (Surface Sediments Classification in Tidal Flats using Multivariate Kriging and KOMPSAT-2 Imagery)

  • 이상원;박노욱;장동호;유희영;임효숙
    • 한국지형학회지
    • /
    • 제19권3호
    • /
    • pp.37-49
    • /
    • 2012
  • 이 논문의 목적은 간석지 표층 퇴적상 분류를 목적으로 다변량 크리깅을 기반으로 고해상도 원격탐사 자료와 현장 조사 자료를 결합하는 방법론을 제안하는데 있다. 퇴적물 성분에 따라 미리 범주화시킨 퇴적물 자료를 사용하여 원격탐사 자료를 분류하는 기존 방법론과 달리 현장 조사 자료와 원격탐사 자료를 이용하여 퇴적물 성분별 분포도를 제작한 후에 최종 단계에서 범주화 시키는 분류 방법론을 제안하였다. 퇴적물 성분별 분포도 제작 과정에서 현장 조사 자료와 원격탐사 자료의 결합을 위해 다변량 크리깅 기법인 회귀 크리깅 기법을 이용하였다. 우선 현장조사 자료의 모래, 실트, 점토 성분별로 고해상도 원격탐사 자료의 분광 정보와 회귀 분석을 수행하여, 각 성분별 경향 성분을 추출하였다. 그리고 현장 조사 자료 위치에서 잔차를 계산한 후에, 잔차에 대해 크리깅을 적용하여 잔차분포도를 얻게 된다. 이후 성분별 경향 성분과 잔차 성분을 합하여 성분별 비율 분포도를 작성한 후에 최종 단계에서 퇴적상 분류를 수행하게 된다. 제안 기법의 적용성 평가를 위해 바람아래 간석지를 대상으로 고해상도 KOMPSAT-2 자료를 이용한 사례 연구를 수행하였다. 사례 연구를 통해 제안 기법이 기존 분류 방법에 비해 상대적으로 높은 분류 정확도를 나타내었으며, 특히 세립질 퇴적물 분류에 더 우수한 것으로 나타났다. 따라서 제안 기법은 원격탐사 자료를 이용한 간석지 표층 퇴적상 분류에 유용하게 사용될 수 있을 것으로 기대된다.

계층구조적 분류모델을 이용한 심전도에서의 비정상 비트 검출 (Detection of Abnormal Heartbeat using Hierarchical Qassification in ECG)

  • 이도훈;조백환;박관수;송수화;이종실;지영준;김인영;김선일
    • 대한의용생체공학회:의공학회지
    • /
    • 제29권6호
    • /
    • pp.466-476
    • /
    • 2008
  • The more people use ambulatory electrocardiogram(ECG) for arrhythmia detection, the more researchers report the automatic classification algorithms. Most of the previous studies don't consider the un-balanced data distribution. Even in patients, there are much more normal beats than abnormal beats among the data from 24 hours. To solve this problem, the hierarchical classification using 21 features was adopted for arrhythmia abnormal beat detection. The features include R-R intervals and data to describe the morphology of the wave. To validate the algorithm, 44 non-pacemaker recordings from physionet were used. The hierarchical classification model with 2 stages on domain knowledge was constructed. Using our suggested method, we could improve the performance in abnormal beat classification from the conventional multi-class classification method. In conclusion, the domain knowledge based hierarchical classification is useful to the ECG beat classification with unbalanced data distribution.