• Title/Summary/Keyword: Supervised learning

Search Result 747, Processing Time 0.027 seconds

A Two-Stage Document Page Segmentation Method using Morphological Distance Map and RBF Network (거리 사상 함수 및 RBF 네트워크의 2단계 알고리즘을 적용한 서류 레이아웃 분할 방법)

  • Shin, Hyun-Kyung
    • Journal of KIISE:Software and Applications
    • /
    • v.35 no.9
    • /
    • pp.547-553
    • /
    • 2008
  • We propose a two-stage document layout segmentation method. At the first stage, as top-down segmentation, morphological distance map algorithm extracts a collection of rectangular regions from a given input image. This preliminary result from the first stage is employed as input parameters for the process of next stage. At the second stage, a machine-learning algorithm is adopted RBF network, one of neural networks based on statistical model, is selected. In order for constructing the hidden layer of RBF network, a data clustering technique bared on the self-organizing property of Kohonen network is utilized. We present a result showing that the supervised neural network, trained by 300 number of sample data, improves the preliminary results of the first stage.

Analysis of Extraction Performance according to the Expanding of Applied Character in Hangul Stroke Element Extraction (한글 획요소 추출 학습에서 적용 글자의 확장에 따른 추출 성능 분석)

  • Jeon, Ja-Yeon;Lim, Soon-Bum
    • Journal of Korea Multimedia Society
    • /
    • v.23 no.11
    • /
    • pp.1361-1371
    • /
    • 2020
  • Fonts have developed as a visual element, and their influence has rapidly increased around the world. Research on font automation is actively being conducted mainly in English because Hangul is a combination character and the structure is complicated. In the previous study to solve this problem, the stroke element of the character was automatically extracted by applying the object detection by component. However, the previous research was only for similarity, so it was tested on various print style fonts, but it has not been tested on other characters. In order to extract the stroke elements of all characters and fonts, we performed a performance analysis experiment according to the expansion character in the Hangul stroke element extraction training. The results were all high overall. In particular, in the font expansion type, the extraction success rate was high regardless of having done the training or not. In the character expansion type, the extraction success rate of trained characters was slightly higher than that of untrained characters. In conclusion, for the perfect Hangul stroke element extraction model, we will introduce Semi-Supervised Learning to increase the number of data and strengthen it.

DeepCleanNet: Training Deep Convolutional Neural Network with Extremely Noisy Labels

  • Olimov, Bekhzod;Kim, Jeonghong
    • Journal of Korea Multimedia Society
    • /
    • v.23 no.11
    • /
    • pp.1349-1360
    • /
    • 2020
  • In recent years, Convolutional Neural Networks (CNNs) have been successfully implemented in different tasks of computer vision. Since CNN models are the representatives of supervised learning algorithms, they demand large amount of data in order to train the classifiers. Thus, obtaining data with correct labels is imperative to attain the state-of-the-art performance of the CNN models. However, labelling datasets is quite tedious and expensive process, therefore real-life datasets often exhibit incorrect labels. Although the issue of poorly labelled datasets has been studied before, we have noticed that the methods are very complex and hard to reproduce. Therefore, in this research work, we propose Deep CleanNet - a considerably simple system that achieves competitive results when compared to the existing methods. We use K-means clustering algorithm for selecting data with correct labels and train the new dataset using a deep CNN model. The technique achieves competitive results in both training and validation stages. We conducted experiments using MNIST database of handwritten digits with 50% corrupted labels and achieved up to 10 and 20% increase in training and validation sets accuracy scores, respectively.

Rule Weight-Based Fuzzy Classification Model for Analyzing Admission-Discharge of Dyspnea Patients (호흡곤란환자의 입-퇴원 분석을 위한 규칙가중치 기반 퍼지 분류모델)

  • Son, Chang-Sik;Shin, A-Mi;Lee, Young-Dong;Park, Hyoung-Seob;Park, Hee-Joon;Kim, Yoon-Nyun
    • Journal of Biomedical Engineering Research
    • /
    • v.31 no.1
    • /
    • pp.40-49
    • /
    • 2010
  • A rule weight -based fuzzy classification model is proposed to analyze the patterns of admission-discharge of patients as a previous research for differential diagnosis of dyspnea. The proposed model is automatically generated from a labeled data set, supervised learning strategy, using three procedure methodology: i) select fuzzy partition regions from spatial distribution of data; ii) generate fuzzy membership functions from the selected partition regions; and iii) extract a set of candidate rules and resolve a conflict problem among the candidate rules. The effectiveness of the proposed fuzzy classification model was demonstrated by comparing the experimental results for the dyspnea patients' data set with 11 features selected from 55 features by clinicians with those obtained using the conventional classification methods, such as standard fuzzy classifier without rule weights, C4.5, QDA, kNN, and SVMs.

Detecting Cyber Threats Domains Based on DNS Traffic (DNS 트래픽 기반의 사이버 위협 도메인 탐지)

  • Lim, Sun-Hee;Kim, Jong-Hyun;Lee, Byung-Gil
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.37B no.11
    • /
    • pp.1082-1089
    • /
    • 2012
  • Recent malicious attempts in Cyber space are intended to emerge national threats such as Suxnet as well as to get financial benefits through a large pool of comprised botnets. The evolved botnets use the Domain Name System(DNS) to communicate with the C&C server and zombies. DNS is one of the core and most important components of the Internet and DNS traffic are continually increased by the popular wireless Internet service. On the other hand, domain names are popular for malicious use. This paper studies on DNS-based cyber threats domain detection by data classification based on supervised learning. Furthermore, the developed cyber threats domain detection system using DNS traffic analysis provides collection, analysis, and normal/abnormal domain classification of huge amounts of DNS data.

Slow Feature Analysis for Mitotic Event Recognition

  • Chu, Jinghui;Liang, Hailan;Tong, Zheng;Lu, Wei
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.3
    • /
    • pp.1670-1683
    • /
    • 2017
  • Mitotic event recognition is a crucial and challenging task in biomedical applications. In this paper, we introduce the slow feature analysis and propose a fully-automated mitotic event recognition method for cell populations imaged with time-lapse phase contrast microscopy. The method includes three steps. First, a candidate sequence extraction method is utilized to exclude most of the sequences not containing mitosis. Next, slow feature is learned from the candidate sequences using slow feature analysis. Finally, a hidden conditional random field (HCRF) model is applied for the classification of the sequences. We use a supervised SFA learning strategy to learn the slow feature function because the strategy brings image content and discriminative information together to get a better encoding. Besides, the HCRF model is more suitable to describe the temporal structure of image sequences than nonsequential SVM approaches. In our experiment, the proposed recognition method achieved 0.93 area under curve (AUC) and 91% accuracy on a very challenging phase contrast microscopy dataset named C2C12.

A Study on a Stress Measurement Algorithm Based on ECG Analysis of NUI-applied Tangible Game Users (NUI가 적용된 체감형 게임의 사용자 심전도 분석에 의한 스트레스 측정 알고리즘 연구)

  • Lee, Hyun-Ju;Shin, Dong-Il;Shin, Dong-Kyoo
    • Journal of Korea Game Society
    • /
    • v.13 no.5
    • /
    • pp.73-80
    • /
    • 2013
  • NUI(Natural User Interface) allows users to directly interact with surrounding digital devices using their voices or body motions without additional input/output interface devices. Our study has been carried out on human users who play a tangible game with body motions in the NUI-applied smart space. ECG was measured for 60 seconds duration before and after playing the game to determine user stress levels, and the measured signals were analyzed through an improved Random Forest algorithm. In order to experiment by a supervised learning, users additionally input whether or not the user felt stress. Moreover, the improved algorithm showed 1.04% higher accuracy than existing algorithm.

Adaptive Intrusion Detection System Based on SVM and Clustering (SVM과 클러스터링 기반 적응형 침입탐지 시스템)

  • Lee, Han-Sung;Im, Young-Hee;Park, Joo-Young;Park, Dai-Hee
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.13 no.2
    • /
    • pp.237-242
    • /
    • 2003
  • In this paper, we propose a new adaptive intrusion detection algorithm based on clustering: Kernel-ART, which is composed of the on-line clustering algorithm, ART (adaptive resonance theory), combining with mercer-kernel and concept vector. Kernel-ART is not only satisfying all desirable characteristics in the context of clustering-based IDS but also alleviating drawbacks associated with the supervised learning IDS. It is able to detect various types of intrusions in real-time by means of generating clusters incrementally.

Two Dimensional Slow Feature Discriminant Analysis via L2,1 Norm Minimization for Feature Extraction

  • Gu, Xingjian;Shu, Xiangbo;Ren, Shougang;Xu, Huanliang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.7
    • /
    • pp.3194-3216
    • /
    • 2018
  • Slow Feature Discriminant Analysis (SFDA) is a supervised feature extraction method inspired by biological mechanism. In this paper, a novel method called Two Dimensional Slow Feature Discriminant Analysis via $L_{2,1}$ norm minimization ($2DSFDA-L_{2,1}$) is proposed. $2DSFDA-L_{2,1}$ integrates $L_{2,1}$ norm regularization and 2D statically uncorrelated constraint to extract discriminant feature. First, $L_{2,1}$ norm regularization can promote the projection matrix row-sparsity, which makes the feature selection and subspace learning simultaneously. Second, uncorrelated features of minimum redundancy are effective for classification. We define 2D statistically uncorrelated model that each row (or column) are independent. Third, we provide a feasible solution by transforming the proposed $L_{2,1}$ nonlinear model into a linear regression type. Additionally, $2DSFDA-L_{2,1}$ is extended to a bilateral projection version called $BSFDA-L_{2,1}$. The advantage of $BSFDA-L_{2,1}$ is that an image can be represented with much less coefficients. Experimental results on three face databases demonstrate that the proposed $2DSFDA-L_{2,1}/BSFDA-L_{2,1}$ can obtain competitive performance.

Classification Methods for Automated Prediction of Power Load Patterns (전력 부하 패턴 자동 예측을 위한 분류 기법)

  • Minghao, Piao;Park, Jin-Hyung;Lee, Heon-Gyu;Ryu, Keun-Ho
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2008.06c
    • /
    • pp.26-30
    • /
    • 2008
  • Currently an automated methodology based on data mining techniques is presented for the prediction of customer load patterns in long duration load profiles. The proposed our approach consists of three stages: (i) data pre-processing: noise or outlier is removed and the continuous attribute-valued features are transformed to discrete values, (ii) cluster analysis: k-means clustering is used to create load pattern classes and the representative load profiles for each class and (iii) classification: we evaluated several supervised learning methods in order to select a suitable prediction method. According to the proposed methodology, power load measured from AMR (automatic meter reading) system, as well as customer indexes, were used as inputs for clustering. The output of clustering was the classification of representative load profiles (or classes). In order to evaluate the result of forecasting load patterns, the several classification methods were applied on a set of high voltage customers of the Korea power system and derived class labels from clustering and other features are used as input to produce classifiers. Lastly, the result of our experiments was presented.

  • PDF