• Title/Summary/Keyword: Unsupervised Classification

Search Result 275, Processing Time 0.026 seconds

A Study on Automatic Classification of Newspaper Articles Based on Unsupervised Learning by Departments (비지도학습 기반의 행정부서별 신문기사 자동분류 연구)

  • Kim, Hyun-Jong;Ryu, Seung-Eui;Lee, Chul-Ho;Nam, Kwang Woo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.21 no.9
    • /
    • pp.345-351
    • /
    • 2020
  • Administrative agencies today are paying keen attention to big data analysis to improve their policy responsiveness. Of all the big data, news articles can be used to understand public opinion regarding policy and policy issues. The amount of news output has increased rapidly because of the emergence of new online media outlets, which calls for the use of automated bots or automatic document classification tools. There are, however, limits to the automatic collection of news articles related to specific agencies or departments based on the existing news article categories and keyword search queries. Thus, this paper proposes a method to process articles using classification glossaries that take into account each agency's different work features. To this end, classification glossaries were developed by extracting the work features of different departments using Word2Vec and topic modeling techniques from news articles related to different agencies. As a result, the automatic classification of newspaper articles for each department yielded approximately 71% accuracy. This study is meaningful in making academic and practical contributions because it presents a method of extracting the work features for each department, and it is an unsupervised learning-based automatic classification method for automatically classifying news articles relevant to each agency.

Comparison and Analysis of Subject Classification for Domestic Research Data (국내 학술논문 주제 분류 알고리즘 비교 및 분석)

  • Choi, Wonjun;Sul, Jaewook;Jeong, Heeseok;Yoon, Hwamook
    • The Journal of the Korea Contents Association
    • /
    • v.18 no.8
    • /
    • pp.178-186
    • /
    • 2018
  • Subject classification of thesis units is essential to serve scholarly information deliverables. However, to date, there is a journal-based topic classification, and there are not many article-level subject classification services. In the case of academic papers among domestic works, subject classification can be a more important information because it can cover a larger area of service and can provide service by setting a range. However, the problem of classifying themes by field requires the hands of experts in various fields, and various methods of verification are needed to increase accuracy. In this paper, we try to classify topics using the unsupervised learning algorithm to find the correct answer in the unknown state and compare the results of the subject classification algorithms using the coherence and perplexity. The unsupervised learning algorithms are a well-known Hierarchical Dirichlet Process (HDP), Latent Dirichlet Allocation (LDA) and Latent Semantic Indexing (LSI) algorithm.

A New Clustering Method for Minimum Classification Error (분류 오류 최소화를 위한 클러스터링 기법)

  • Heo, Gyeong-Yong;Kim, Seong-Hoon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.19 no.7
    • /
    • pp.1-8
    • /
    • 2014
  • Clustering is one of the most popular unsupervised learning methods, which is widely used to form clusters with homogeneous data. Clustering was used to extract contexts corresponding to clusters and a classification method was applied to each context or cluster individually. However, it is difficult to say that the unsupervised clustering is the best context forming method from the view of classification. In this paper, a new clustering method considering classification was proposed. The proposed method tries to minimize classification error in each cluster when a classification method is applied to each context locally. For this purpose, the proposed method adds constraints forcing two data points belong to the same class to have small distances, and two data points belong to different classes to have large distances in each cluster like in linear discriminant analysis. The usefulness of the proposed method is confirmed by experimental results.

Landform Classification using Geomorphons (지형패턴(Geomorphons)을 이용한 새로운 지형분류방법)

  • KIM, Dong-Eun;SEONG, Yeong Bae;SOHN, Hak Gi;CHOI, Kwang Hee
    • Journal of The Geomorphological Association of Korea
    • /
    • v.19 no.4
    • /
    • pp.139-155
    • /
    • 2012
  • Most of previous landform classification methods using DEM compares the values between the center of the cell and the surrounding cells, which in turn, greatly depends on analysis scale. To overcome the problem of scale-dependency, a new classification scheme is developed, which is called "Geomorphons". Unlike the traditional approaches using DEM, Geomorphons is the way which compares the level with other cells against the criteria cell. As a pilot study, we classify the landforms of Pyeongchang-Gun in Korea. Then, we compare the result with the other methods such as Topographic Position Index. Through the systematic analysis, we obtain the following findings. First, Geomorphons can reduce the time for the classification of landforms because of using unsupervised classification. Second, Geomorphons is little dependent on change in the scale, which can provide a pilot tool for reconnaissance study for covering large area.

Design of Pattern Classification Rule based on Local Linear Discriminant Analysis Classifier by using Differential Evolutionary Algorithm (차분진화 알고리즘을 이용한 지역 Linear Discriminant Analysis Classifier 기반 패턴 분류 규칙 설계)

  • Roh, Seok-Beom;Hwang, Eun-Jin;Ahn, Tae-Chon
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.22 no.1
    • /
    • pp.81-86
    • /
    • 2012
  • In this paper, we proposed a new design methodology of a pattern classification rule based on the local linear discriminant analysis expanded from the generic linear discriminant analysis which is used in the local area divided from the whole input space. There are two ways such as k-Means clustering method and the differential evolutionary algorithm to partition the whole input space into the several local areas. K-Means clustering method is the one of the unsupervised clustering methods and the differential evolutionary algorithm is the one of the optimization algorithms. In addition, the experimental application covers a comparative analysis including several previously commonly encountered methods.

A New Unsupervised Learning Network and Competitive Learning Algorithm Using Relative Similarity (상대유사도를 이용한 새로운 무감독학습 신경망 및 경쟁학습 알고리즘)

  • 류영재;임영철
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.10 no.3
    • /
    • pp.203-210
    • /
    • 2000
  • In this paper, we propose a new unsupervised learning network and competitive learning algorithm for pattern classification. The proposed network is based on relative similarity, which is similarity measure between input data and cluster group. So, the proposed network and algorithm is called relative similarity network(RSN) and learning algorithm. According to definition of similarity and learning rule, structure of RSN is designed and pseudo code of the algorithm is described. In general pattern classification, RSN, in spite of deletion of learning rate, resulted in the identical performance with those of WTA, and SOM. While, in the patterns with cluster groups of unclear boundary, or patterns with different density and various size of cluster groups, RSN produced more effective classification than those of other networks.

  • PDF

Application of the Rule-Based Image Classification Method to Jeju Island (규칙기반 영상분류 방법의 제주도 지역의 적용)

  • Lee, Jin-A;Lee, Sung-Soon
    • Spatial Information Research
    • /
    • v.21 no.1
    • /
    • pp.63-73
    • /
    • 2013
  • Geographic features are reflected in satellite images, which contain characteristic elements. Information on changes can be obtained through a comparison of images taken at different times. If multi-temporal images can be classified through the use of an unsupervised method, this is likely to improve the accuracy of image classification and contribute to various applications. A rule-based image classification algorithm for automatic processing without human involvement has been developed, but it must be verified that its results are not affected by imperfect elements. In this study, Landsat images of Jeju Island were used to carry out a rule-based image classification. The application results were examined for complex cases, including the presence of clouds in the images, different photographed times, and the type of target area, such as city, mountain, or field. The presence of clouds did not affect calculations, and appropriate classification rules were applied, depending on the different photographed times. The expansion of the urban areas of Jeju and the increase of facilities such as vinyl greenhouses in Seoguipo were identified. Furthermore, space information changes and accurate classifications for Jeju Island were obtained. With the goal of performing high-quality unsupervised classifications, measures to generalize and improve the methods employed were searched for. The findings of this study could be used in time-series analyses of images for various applications, including urban development and environmental change monitoring.

Image Fusion for Improving Classification

  • Lee, Dong-Cheon;Kim, Jeong-Woo;Kwon, Jay-Hyoun;Kim, Chung;Park, Ki-Surk
    • Proceedings of the KSRS Conference
    • /
    • 2003.11a
    • /
    • pp.1464-1466
    • /
    • 2003
  • classification of the satellite images provides information about land cover and/or land use. Quality of the classification result depends mainly on the spatial and spectral resolutions of the images. In this study, image fusion in terms of resolution merging, and band integration with multi-source of the satellite images; Landsat ETM+ and Ikonos were carried out to improve classification. Resolution merging and band integration could generate imagery of high resolution with more spectral bands. Precise image co-registration is required to remove geometric distortion between different sources of images. Combination of unsupervised and supervised classification of the fused imagery was implemented to improve classification. 3D display of the results was possible by combining DEM with the classification result so that interpretability could be improved.

  • PDF

A Comparison Study of Classification Algorithms in Data Mining

  • Lee, Seung-Joo;Jun, Sung-Rae
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.8 no.1
    • /
    • pp.1-5
    • /
    • 2008
  • Generally the analytical tools of data mining have two learning types which are supervised and unsupervised learning algorithms. Classification and prediction are main analysis tools for supervised learning. In this paper, we perform a comparison study of classification algorithms in data mining. We make comparative studies between popular classification algorithms which are LDA, QDA, kernel method, K-nearest neighbor, naive Bayesian, SVM, and CART. Also, we use almost all classification data sets of UCI machine learning repository for our experiments. According to our results, we are able to select proper algorithms for given classification data sets.

Kansas Vegetation Mapping Using Multi-Temporal Remote Sensing Data: A Hybrid Approach (계절별 위성자료를 이용한 미국 캔자스주 식생 분류 - 하이브리드 접근방식의 적용 -)

  • ;Stephen Egbert;Dana Peterson;Aimee Stewart;Chris Lauver;Kevin Price;Clayton Blodgett;Jack Cully, Jr,;Glennis Kaufman
    • Journal of the Korean Geographical Society
    • /
    • v.38 no.5
    • /
    • pp.667-685
    • /
    • 2003
  • To address the requirements of gap analysis for species protection, as well as the needs of state and federal agencies for detailed digital land cover, a 43-class map at the vegetation alliance level was created for the state of Kansas using multi-temporal Thematic Mapper imagery. The mapping approach included the use of three-date multi-seasonal imagery, a two-stage classification approach that first masked out cropland areas using unsupervised classification and then mapped natural vegetation with supervised classification, visualization techniques utilizing a map of small multiples and field experts, and extensive use of ancillary data in post-hoc processing. Accuracy assessment was conducted at three levels of generalization (Anderson Level I, vegetation formation, and vegetation alliance) and three cross-tabulation approaches. Overall accuracy ranged from 51.7% to 89.4%, depending on level of generalization, while accuracy figures for individual alliance classes varied by area covered and level of sampling.