• 제목/요약/키워드: Supervised learning

검색결과 747건 처리시간 0.032초

거리 사상 함수 및 RBF 네트워크의 2단계 알고리즘을 적용한 서류 레이아웃 분할 방법 (A Two-Stage Document Page Segmentation Method using Morphological Distance Map and RBF Network)

  • 신현경
    • 한국정보과학회논문지:소프트웨어및응용
    • /
    • 제35권9호
    • /
    • pp.547-553
    • /
    • 2008
  • 본 논문에서는 2 단계 서류 레이아웃 분할 방법을 제안한다. 서류 분할의 1 차 단계는 top-down 계열의 영역 추출로서 모폴로지 기반의 거리 함수를 사용하여 주어진 영상 데이타를 사각형 영역들로 분할한다. 거리 사상 함수를 통한 예비 결과는 성능 개선을 위한 2 차 단계의 입력 변수로 작용한다. 서류 분할의 2차 단계로서 기계 학습 이론을 적용한다. 통계 모델을 따르는 RBF 신경망을 선택하였고, 은닉 층의 설계를 위해 코호넨 네트워크의 자기 조직화 성격을 활용한 데이타 군집화 기법을 기반으로 하였다. 본 논문에서는 300개의 영상에서 추출된 영역 데이타를 통해 학습된 신경망이 1차 단계에서 도출된 예비 결과를 개선함을 연구 결과로 제시하였다.

한글 획요소 추출 학습에서 적용 글자의 확장에 따른 추출 성능 분석 (Analysis of Extraction Performance according to the Expanding of Applied Character in Hangul Stroke Element Extraction)

  • 전자연;임순범
    • 한국멀티미디어학회논문지
    • /
    • 제23권11호
    • /
    • pp.1361-1371
    • /
    • 2020
  • Fonts have developed as a visual element, and their influence has rapidly increased around the world. Research on font automation is actively being conducted mainly in English because Hangul is a combination character and the structure is complicated. In the previous study to solve this problem, the stroke element of the character was automatically extracted by applying the object detection by component. However, the previous research was only for similarity, so it was tested on various print style fonts, but it has not been tested on other characters. In order to extract the stroke elements of all characters and fonts, we performed a performance analysis experiment according to the expansion character in the Hangul stroke element extraction training. The results were all high overall. In particular, in the font expansion type, the extraction success rate was high regardless of having done the training or not. In the character expansion type, the extraction success rate of trained characters was slightly higher than that of untrained characters. In conclusion, for the perfect Hangul stroke element extraction model, we will introduce Semi-Supervised Learning to increase the number of data and strengthen it.

DeepCleanNet: Training Deep Convolutional Neural Network with Extremely Noisy Labels

  • Olimov, Bekhzod;Kim, Jeonghong
    • 한국멀티미디어학회논문지
    • /
    • 제23권11호
    • /
    • pp.1349-1360
    • /
    • 2020
  • In recent years, Convolutional Neural Networks (CNNs) have been successfully implemented in different tasks of computer vision. Since CNN models are the representatives of supervised learning algorithms, they demand large amount of data in order to train the classifiers. Thus, obtaining data with correct labels is imperative to attain the state-of-the-art performance of the CNN models. However, labelling datasets is quite tedious and expensive process, therefore real-life datasets often exhibit incorrect labels. Although the issue of poorly labelled datasets has been studied before, we have noticed that the methods are very complex and hard to reproduce. Therefore, in this research work, we propose Deep CleanNet - a considerably simple system that achieves competitive results when compared to the existing methods. We use K-means clustering algorithm for selecting data with correct labels and train the new dataset using a deep CNN model. The technique achieves competitive results in both training and validation stages. We conducted experiments using MNIST database of handwritten digits with 50% corrupted labels and achieved up to 10 and 20% increase in training and validation sets accuracy scores, respectively.

호흡곤란환자의 입-퇴원 분석을 위한 규칙가중치 기반 퍼지 분류모델 (Rule Weight-Based Fuzzy Classification Model for Analyzing Admission-Discharge of Dyspnea Patients)

  • 손창식;신아미;이영동;박형섭;박희준;김윤년
    • 대한의용생체공학회:의공학회지
    • /
    • 제31권1호
    • /
    • pp.40-49
    • /
    • 2010
  • A rule weight -based fuzzy classification model is proposed to analyze the patterns of admission-discharge of patients as a previous research for differential diagnosis of dyspnea. The proposed model is automatically generated from a labeled data set, supervised learning strategy, using three procedure methodology: i) select fuzzy partition regions from spatial distribution of data; ii) generate fuzzy membership functions from the selected partition regions; and iii) extract a set of candidate rules and resolve a conflict problem among the candidate rules. The effectiveness of the proposed fuzzy classification model was demonstrated by comparing the experimental results for the dyspnea patients' data set with 11 features selected from 55 features by clinicians with those obtained using the conventional classification methods, such as standard fuzzy classifier without rule weights, C4.5, QDA, kNN, and SVMs.

DNS 트래픽 기반의 사이버 위협 도메인 탐지 (Detecting Cyber Threats Domains Based on DNS Traffic)

  • 임선희;김종현;이병길
    • 한국통신학회논문지
    • /
    • 제37B권11호
    • /
    • pp.1082-1089
    • /
    • 2012
  • 최근 사이버 공간에서는 대규모 사이버 공격들을 위해 봇넷(Botnet)을 형성하여 자산 손실과 같은 경제적 위협뿐만 아니라 Stuxnet과 같은 국가적으로 위협이 되고 있다. 진화된 봇넷은 DNS(Domain Name System)를 악용하여 C&C 서버와 좀비간의 통신 수단으로 사용하고 있다. DNS는 인터넷에서의 주요 인프라이고, 무선 인터넷의 대중화로 지속적으로 DNS 트래픽이 증가되고 있다. 반면에, 도메인 주소를 이용한 공격들도 증가되고 있는 현실이다. 본 논문에서는 지도 학습 기반의 데이터 분류 기술을 이용한 DNS 트래픽 기반의 사이버 위협 도메인 탐지 기술에 대해 연구한다. 더불어, 개발된 DNS 트래픽을 이용한 사이버위협 도메인 탐지 시스템은 대용량의 DNS데이터를 수집, 분석, 정상/비정상 도메인 분류 기능을 제공한다.

Slow Feature Analysis for Mitotic Event Recognition

  • Chu, Jinghui;Liang, Hailan;Tong, Zheng;Lu, Wei
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제11권3호
    • /
    • pp.1670-1683
    • /
    • 2017
  • Mitotic event recognition is a crucial and challenging task in biomedical applications. In this paper, we introduce the slow feature analysis and propose a fully-automated mitotic event recognition method for cell populations imaged with time-lapse phase contrast microscopy. The method includes three steps. First, a candidate sequence extraction method is utilized to exclude most of the sequences not containing mitosis. Next, slow feature is learned from the candidate sequences using slow feature analysis. Finally, a hidden conditional random field (HCRF) model is applied for the classification of the sequences. We use a supervised SFA learning strategy to learn the slow feature function because the strategy brings image content and discriminative information together to get a better encoding. Besides, the HCRF model is more suitable to describe the temporal structure of image sequences than nonsequential SVM approaches. In our experiment, the proposed recognition method achieved 0.93 area under curve (AUC) and 91% accuracy on a very challenging phase contrast microscopy dataset named C2C12.

NUI가 적용된 체감형 게임의 사용자 심전도 분석에 의한 스트레스 측정 알고리즘 연구 (A Study on a Stress Measurement Algorithm Based on ECG Analysis of NUI-applied Tangible Game Users)

  • 이현주;신동일;신동규
    • 한국게임학회 논문지
    • /
    • 제13권5호
    • /
    • pp.73-80
    • /
    • 2013
  • NUI(Natural User Interface)는 별도의 입출력 장치 없이 사용자 자신의 음성/신체부위 등을 사용하여 주변 디지털 기기를 제어할 수 있도록 하는 기술이다. 본 논문에서는 NUI가 적용된 스마트 공간에서 신체를 직접적으로 사용하는 체감형 게임을 실행하는 사용자를 대상으로 연구를 진행하였다. 게임 사용자의 스트레스 발생 여부를 알아내기 위하여 게임 시행 전과 후로 나누어 각각 60초에 걸쳐서 심전도를 측정하였고, 측정된 신호를 개량된 Random Forest 알고리즘으로 분석하였다. 교사학습 방식에 의한 실험을 위하여 사용자는 자신의 스트레스 발생 여부를 별도로 입력하여 저장하도록 하였으며, 실험결과 개량된 알고리즘이 기존의 알고리즘보다 1.04% 높은 정확도를 보여주었다.

SVM과 클러스터링 기반 적응형 침입탐지 시스템 (Adaptive Intrusion Detection System Based on SVM and Clustering)

  • 이한성;임영희;박주영;박대희
    • 한국지능시스템학회논문지
    • /
    • 제13권2호
    • /
    • pp.237-242
    • /
    • 2003
  • 본 논문에서는 클러스터링을 기반으로 하는 새로운 침입탐지 알고리즘인 Kernel-ART를 제안한다. Kernel-ART는 개념벡터(concept vector)와 SVM(support vector machine)의 머서 커널(mercer-kernel)을 온라인 클러스터링 알고리즘인 ART(adaptive resonance theory)에 접목시킨 새로운 알고리즘으로서 교사학습 기반 침입탐지 시스템의 단점을 극복할 뿐만 아니라, 클러스터링 기반 침입탐지 시스템에서 요구되는 모든 평가 기준들을 만족한다. 본 논문에서 제안하는 알고리즘은 클러스터를 점증적으로 생성함으로써 여러 가지 다양한 침입 유형들을 실시간으로 탐지할 수 있다.

Two Dimensional Slow Feature Discriminant Analysis via L2,1 Norm Minimization for Feature Extraction

  • Gu, Xingjian;Shu, Xiangbo;Ren, Shougang;Xu, Huanliang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제12권7호
    • /
    • pp.3194-3216
    • /
    • 2018
  • Slow Feature Discriminant Analysis (SFDA) is a supervised feature extraction method inspired by biological mechanism. In this paper, a novel method called Two Dimensional Slow Feature Discriminant Analysis via $L_{2,1}$ norm minimization ($2DSFDA-L_{2,1}$) is proposed. $2DSFDA-L_{2,1}$ integrates $L_{2,1}$ norm regularization and 2D statically uncorrelated constraint to extract discriminant feature. First, $L_{2,1}$ norm regularization can promote the projection matrix row-sparsity, which makes the feature selection and subspace learning simultaneously. Second, uncorrelated features of minimum redundancy are effective for classification. We define 2D statistically uncorrelated model that each row (or column) are independent. Third, we provide a feasible solution by transforming the proposed $L_{2,1}$ nonlinear model into a linear regression type. Additionally, $2DSFDA-L_{2,1}$ is extended to a bilateral projection version called $BSFDA-L_{2,1}$. The advantage of $BSFDA-L_{2,1}$ is that an image can be represented with much less coefficients. Experimental results on three face databases demonstrate that the proposed $2DSFDA-L_{2,1}/BSFDA-L_{2,1}$ can obtain competitive performance.

전력 부하 패턴 자동 예측을 위한 분류 기법 (Classification Methods for Automated Prediction of Power Load Patterns)

  • ;박진형;이헌규;류근호
    • 한국정보과학회:학술대회논문집
    • /
    • 한국정보과학회 2008년도 한국컴퓨터종합학술대회논문집 Vol.35 No.1 (C)
    • /
    • pp.26-30
    • /
    • 2008
  • Currently an automated methodology based on data mining techniques is presented for the prediction of customer load patterns in long duration load profiles. The proposed our approach consists of three stages: (i) data pre-processing: noise or outlier is removed and the continuous attribute-valued features are transformed to discrete values, (ii) cluster analysis: k-means clustering is used to create load pattern classes and the representative load profiles for each class and (iii) classification: we evaluated several supervised learning methods in order to select a suitable prediction method. According to the proposed methodology, power load measured from AMR (automatic meter reading) system, as well as customer indexes, were used as inputs for clustering. The output of clustering was the classification of representative load profiles (or classes). In order to evaluate the result of forecasting load patterns, the several classification methods were applied on a set of high voltage customers of the Korea power system and derived class labels from clustering and other features are used as input to produce classifiers. Lastly, the result of our experiments was presented.

  • PDF