• Title/Summary/Keyword: Supervised Classification

Search Result 405, Processing Time 0.036 seconds

A Study on the User-Based Small Fishing Boat Collision Alarm Classification Model Using Semi-supervised Learning (준지도 학습을 활용한 사용자 기반 소형 어선 충돌 경보 분류모델에대한 연구)

  • Ho-June Seok;Seung Sim;Jeong-Hun Woo;Jun-Rae Cho;Jaeyong Jung;DeukJae Cho;Jong-Hwa Baek
    • Journal of Navigation and Port Research
    • /
    • v.47 no.6
    • /
    • pp.358-366
    • /
    • 2023
  • This study aimed to provide a solution for improving ship collision alert of the 'accident vulnerable ship monitoring service' among the 'intelligent marine traffic information system' services of the Ministry of Oceans and Fisheries. The current ship collision alert uses a supervised learning (SL) model with survey labels based on large ship-oriented data and its operators. Consequently, the small ship data and the operator's opinion are not reflected in the current collision-supervised learning model, and the effect is insufficient because the alarm is provided from a longer distance than the small ship operator feels. In addition, the supervised learning (SL) method requires a large number of labeled data, and the labeling process requires a lot of resources and time. To overcome these limitations, in this paper, the classification model of collision alerts for small ships using unlabeled data with the semi-supervised learning (SSL) algorithms (Label Propagation and TabNet) was studied. Results of real-time experiments on small ship operators using the classification model of collision alerts showed that the satisfaction of operators increased.

Development of Supervised Machine Learning based Catalog Entry Classification and Recommendation System (지도학습 머신러닝 기반 카테고리 목록 분류 및 추천 시스템 구현)

  • Lee, Hyung-Woo
    • Journal of Internet Computing and Services
    • /
    • v.20 no.1
    • /
    • pp.57-65
    • /
    • 2019
  • In the case of Domeggook B2B online shopping malls, it has a market share of over 70% with more than 2 million members and 800,000 items are sold per one day. However, since the same or similar items are stored and registered in different catalog entries, it is difficult for the buyer to search for items, and problems are also encountered in managing B2B large shopping malls. Therefore, in this study, we developed a catalog entry auto classification and recommendation system for products by using semi-supervised machine learning method based on previous huge shopping mall purchase information. Specifically, when the seller enters the item registration information in the form of natural language, KoNLPy morphological analysis process is performed, and the Naïve Bayes classification method is applied to implement a system that automatically recommends the most suitable catalog information for the article. As a result, it was possible to improve both the search speed and total sales of shopping mall by building accuracy in catalog entry efficiently.

Design of a Classifier Based on Supervised Learning Using Fuzzy Membership Function and Weighted Average (퍼지 소속도 함수와 가중치 평균을 이용한 지도 학습 기반 분류기 설계)

  • Woo, Young Woon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.4
    • /
    • pp.508-514
    • /
    • 2021
  • In this paper, to propose a classifier based on supervised learning, three types of fuzzy membership functions that determine the membership of each feature of classification data are proposed. In addition, the possibility of improving the classifier performance was suggested by using the average value calculation method used in the process of deriving the classification result using the average value of the membership degrees for each feature, not by using a simple arithmetic average, but by using a weighted average using various weights. To experiment with the proposed methods, three standard data sets were used: Iris, Ecoli, and Yeast. As a result of the experiment, it was confirmed that evenly excellent classification performance can be obtained for data sets of different characteristics. It was confirmed that better classification performance is possible through improvement of fuzzy membership functions and the weighted average methods.

Accuracy Evaluation of Supervised Classification about IKONOS Imagery using Mixed Pixels (혼합화소를 이용한 IKONOS 영상의 감독분류정확도 평가)

  • Lee, Jong-Sin;Kim, Min-Gyu;Park, Joon-Kyu
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.13 no.6
    • /
    • pp.2751-2756
    • /
    • 2012
  • Selection of training set influences the classification accuracy in supervised classification using satellite imagery. Generally, if pure pixels which character of training set is clear were selected, whole accuracy is high while if mixed pixels were selected, accuracy is decreased because of low-resolution imagery or unclear distinguishment. However, it is too difficult to choose the pure pixels as training set actually. Accordingly, this study should be suggested the suitable classification method in case of mixed pixels choice. To achieve this, a few pure pixels were chosen as training set and classification accuracy was calculated which was compared with classification result using an equal number of mixed pixels. As a result, accuracy of SVM was the highest among the classification method using mixed pixels and it was a relatively small difference with the result of classification using pure pixels. Therefore, imagery classification using SVM is most suitable in the mixed area of construction and green because it is high possibility to choose mixed pixels as training set.

An Empirical Study on the Land Cover Classification Method using IKONOS Image (IKONOS 영상의 토지피복분류 방법에 관한 실증 연구)

  • Sakong, Hosang;Im, Jungho
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.6 no.3
    • /
    • pp.107-116
    • /
    • 2003
  • This study investigated how appropriate the classification methods based on conventional spectral characteristics are for high resolution imagery. A supervised classification mixing parametric and non-parametric rules, a method in which fuzzy theory is applied to such classification, and an unsupervised method were performed and compared to each other for accuracy. In addition, comparing the result screen-digitized through interpretation to the classification result using spectral characteristics, this study analyzed the conformity of both methods. Although the supervised classification to which fuzzy theory was applied showed the best performance, the application of conventional classification techniques to high resolution imagery had some limitations due to there being too much information unnecessary to classification, shadows, and a lack of spectral information. Consequently, more advanced techniques including integration with other advanced remote sensing technologies, such as lidar, and application of filtering or template techniques, are required to classify land cover/use or to extract useful information from high resolution imagery.

  • PDF

Feature Selection with PCA based on DNS Query for Malicious Domain Classification (비정상도메인 분류를 위한 DNS 쿼리 기반의 주성분 분석을 이용한 성분추출)

  • Lim, Sun-Hee;Cho, Jaeik;Kim, Jong-Hyun;Lee, Byung Gil
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.1 no.1
    • /
    • pp.55-60
    • /
    • 2012
  • Recent botnets are widely using the DNS services at the connection of C&C server in order to evade botnet's detection. It is necessary to study on DNS analysis in order to counteract anomaly-based technique using the DNS. This paper studies collection of DNS traffic for experimental data and supervised learning for DNS traffic-based malicious domain classification such as query of domain name corresponding to C&C server from zombies. Especially, this paper would aim to determine significant features of DNS-based classification system for malicious domain extraction by the Principal Component Analysis(PCA).

Mapping Categories of Heterogeneous Sources Using Text Analytics (텍스트 분석을 통한 이종 매체 카테고리 다중 매핑 방법론)

  • Kim, Dasom;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.4
    • /
    • pp.193-215
    • /
    • 2016
  • In recent years, the proliferation of diverse social networking services has led users to use many mediums simultaneously depending on their individual purpose and taste. Besides, while collecting information about particular themes, they usually employ various mediums such as social networking services, Internet news, and blogs. However, in terms of management, each document circulated through diverse mediums is placed in different categories on the basis of each source's policy and standards, hindering any attempt to conduct research on a specific category across different kinds of sources. For example, documents containing content on "Application for a foreign travel" can be classified into "Information Technology," "Travel," or "Life and Culture" according to the peculiar standard of each source. Likewise, with different viewpoints of definition and levels of specification for each source, similar categories can be named and structured differently in accordance with each source. To overcome these limitations, this study proposes a plan for conducting category mapping between different sources with various mediums while maintaining the existing category system of the medium as it is. Specifically, by re-classifying individual documents from the viewpoint of diverse sources and storing the result of such a classification as extra attributes, this study proposes a logical layer by which users can search for a specific document from multiple heterogeneous sources with different category names as if they belong to the same source. Besides, by collecting 6,000 articles of news from two Internet news portals, experiments were conducted to compare accuracy among sources, supervised learning and semi-supervised learning, and homogeneous and heterogeneous learning data. It is particularly interesting that in some categories, classifying accuracy of semi-supervised learning using heterogeneous learning data proved to be higher than that of supervised learning and semi-supervised learning, which used homogeneous learning data. This study has the following significances. First, it proposes a logical plan for establishing a system to integrate and manage all the heterogeneous mediums in different classifying systems while maintaining the existing physical classifying system as it is. This study's results particularly exhibit very different classifying accuracies in accordance with the heterogeneity of learning data; this is expected to spur further studies for enhancing the performance of the proposed methodology through the analysis of characteristics by category. In addition, with an increasing demand for search, collection, and analysis of documents from diverse mediums, the scope of the Internet search is not restricted to one medium. However, since each medium has a different categorical structure and name, it is actually very difficult to search for a specific category insofar as encompassing heterogeneous mediums. The proposed methodology is also significant for presenting a plan that enquires into all the documents regarding the standards of the relevant sites' categorical classification when the users select the desired site, while maintaining the existing site's characteristics and structure as it is. This study's proposed methodology needs to be further complemented in the following aspects. First, though only an indirect comparison and evaluation was made on the performance of this proposed methodology, future studies would need to conduct more direct tests on its accuracy. That is, after re-classifying documents of the object source on the basis of the categorical system of the existing source, the extent to which the classification was accurate needs to be verified through evaluation by actual users. In addition, the accuracy in classification needs to be increased by making the methodology more sophisticated. Furthermore, an understanding is required that the characteristics of some categories that showed a rather higher classifying accuracy of heterogeneous semi-supervised learning than that of supervised learning might assist in obtaining heterogeneous documents from diverse mediums and seeking plans that enhance the accuracy of document classification through its usage.

A Study on Improving the predict accuracy rate of Hybrid Model Technique Using Error Pattern Modeling : Using Logistic Regression and Discriminant Analysis

  • Cho, Yong-Jun;Hur, Joon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.17 no.2
    • /
    • pp.269-278
    • /
    • 2006
  • This paper presents the new hybrid data mining technique using error pattern, modeling of improving classification accuracy. The proposed method improves classification accuracy by combining two different supervised learning methods. The main algorithm generates error pattern modeling between the two supervised learning methods(ex: Neural Networks, Decision Tree, Logistic Regression and so on.) The Proposed modeling method has been applied to the simulation of 10,000 data sets generated by Normal and exponential random distribution. The simulation results show that the performance of proposed method is superior to the existing methods like Logistic regression and Discriminant analysis.

  • PDF

Landsat Images Applied for Analyzing Spatial Flow and Water Quality Patterns in a Korea Estuary Dam

  • Park, S.W.;Torii, K.;Aoyama, S.;Cho, B. J.
    • Proceedings of the KSRS Conference
    • /
    • 2003.11a
    • /
    • pp.1239-1241
    • /
    • 2003
  • This paper presents the results of Landsat-TM imagery applications for detecting spatial variations of the water environments in the Saemankeum (STLR) project areas. The simulated tidal flow patterns from a two -dimensional hydro - dynamic model and water quality data from STRL project were used for relationships with the satellite data. Unsupervised classification of the tidal water body reflects the overall flow patterns at a flooding tide. Regressive equations for water quality parameters were derived and used for supervised classifications. The results were found to be useful to synoptically evaluate the water environments during the construction stages of the STLR project.

  • PDF

Support Vector Machine and Spectral Angle Mapper Classifications of High Resolution Hyper Spectral Aerial Image

  • Enkhbaatar, Lkhagva;Jayakumar, S.;Heo, Joon
    • Korean Journal of Remote Sensing
    • /
    • v.25 no.3
    • /
    • pp.233-242
    • /
    • 2009
  • This paper presents two different types of supervised classifiers such as support vector machine (SVM) and spectral angle mapper (SAM). The Compact Airborne Spectrographic Imager (CASI) high resolution aerial image was classified with the above two classifier. The image was classified into eight land use /land cover classes. Accuracy assessment and Kappa statistics were estimated for SVM and SAM separately. The overall classification accuracy and Kappa statistics value of the SAM were 69.0% and 0.62 respectively, which were higher than those of SVM (62.5%, 0.54).