• Title/Summary/Keyword: automatic classification

Search Result 871, Processing Time 0.031 seconds

Performance Comparison of Automatic Classification Using Word Embeddings of Book Titles (단행본 서명의 단어 임베딩에 따른 자동분류의 성능 비교)

  • Yong-Gu Lee
    • Journal of the Korean Society for information Management
    • /
    • v.40 no.4
    • /
    • pp.307-327
    • /
    • 2023
  • To analyze the impact of word embedding on book titles, this study utilized word embedding models (Word2vec, GloVe, fastText) to generate embedding vectors from book titles. These vectors were then used as classification features for automatic classification. The classifier utilized the k-nearest neighbors (kNN) algorithm, with the categories for automatic classification based on the DDC (Dewey Decimal Classification) main class 300 assigned by libraries to books. In the automatic classification experiment applying word embeddings to book titles, the Skip-gram architectures of Word2vec and fastText showed better results in the automatic classification performance of the kNN classifier compared to the TF-IDF features. In the optimization of various hyperparameters across the three models, the Skip-gram architecture of the fastText model demonstrated overall good performance. Specifically, better performance was observed when using hierarchical softmax and larger embedding dimensions as hyperparameters in this model. From a performance perspective, fastText can generate embeddings for substrings or subwords using the n-gram method, which has been shown to increase recall. The Skip-gram architecture of the Word2vec model generally showed good performance at low dimensions(size 300) and with small sizes of negative sampling (3 or 5).

An Analytical Study on Automatic Classification of Domestic Journal articles Based on Machine Learning (기계학습에 기초한 국내 학술지 논문의 자동분류에 관한 연구)

  • Kim, Pan Jun
    • Journal of the Korean Society for information Management
    • /
    • v.35 no.2
    • /
    • pp.37-62
    • /
    • 2018
  • This study examined the factors affecting the performance of automatic classification based on machine learning for domestic journal articles in the field of LIS. In particular, In view of the classification performance that assigning automatically the class labels to the articles in "Journal of the Korean Society for Information Management", I investigated the characteristics of the key factors(weighting schemes, training set size, classification algorithms, label assigning methods) through the diversified experiments. Consequently, It is effective to apply each element appropriately according to the classification environment and the characteristics of the document set, and a fairly good performance can be obtained by using a simpler model. In addition, the classification of domestic journals can be considered as a multi-label classification that assigns more than one category to a specific article. Therefore, I proposed an optimal classification model using simple and fast classification algorithm and small learning set considering this environment.

Development of XML based HACCP Diet Automatic Classification System (XML 기반 HACCP 식단 자동 분류 시스템 개발)

  • Cha, Kyung-Ae;Yeo, Sun-Dong;Hong, Won-Kee
    • Journal of Korea Multimedia Society
    • /
    • v.19 no.1
    • /
    • pp.86-95
    • /
    • 2016
  • The main objective of HACCP(Hazard analysis and critical control points) system is to provide a systematic preventive approach how to control the risks in food production process. Practically, the diet classification process performed at the one of the beginning steps of the HACCP system, makes an important role of determining food safety risks and how to control them in every control point according to the different risk level of the diet. In this paper, we propose an automatic diet classification method for HACCP system using XML(eXtensible Markup Language). In order to guarantee the diet classification accuracy, we design the XML schema and attributes represents the relationship of every diet and ingredients analysing the HACCP diet classification rules. Based on the XML schema and document generation method, we develope the proposed system as client and server model that implemented XML based HACCP diet information generation module and integrated HACCP information management modules, respectively. Moreover, we show the efficiency of the proposed system with experiment results describing the school food diet information as XML documents and parsing the diet information.

Facial Features Extraction for Sasang Constitution Classification (사상채질 분류를 위한 안면부내 특징 요소 추출)

  • Bae, Na-Yeong;An, Taek-Won;Jo, Dong-Uk;Lee, Hwa-Seop
    • Journal of Sasang Constitutional Medicine
    • /
    • v.17 no.2
    • /
    • pp.46-51
    • /
    • 2005
  • 1. Objectives The purpose of this study is to objectify the diagnosis of Sasang Constitution. Using the methods of this study, it will improve to classificate Sasang Constitution. 2. Methods 1) Automatic feature extraction of human frontal faces for Sasang Constitution classification. 2) Color feature extraction of human frontal faces (1)Erosion filtering (skin-white, the other-black) (2) Median median 3. Results and Conclusions Observing a person's shape has been the major method for Sasang Constitution classification, which usually has been dependent upon doctor's intuition as of these days. We are developing an automatic system which provides objective basic data for Sasang Constitution classification. For this, in this paper, firstly, the signal processing techniques are applied to automatic feature extraction of human frontal faces for Sasang Constitution classification. The experiment is conducted to verify the effectiveness of the proposed system.

  • PDF

Design Classification and Development of Pattern Searching Algorithm Based on Pattern Design Elements - With focus on Automatic Pattern Design System for Baseball Uniforms Manufactured under Custom-MTM System - (패턴설계요소기반의 디자인 분류 및 패턴탐색 알고리즘개발 - 맞춤양산형 야구복 자동패턴 설계시스템을 위한 -)

  • Kang, In-Ae;Choi, Kueng-Mi;Jun, Jung-Ill
    • Fashion & Textile Research Journal
    • /
    • v.13 no.5
    • /
    • pp.734-742
    • /
    • 2011
  • This study has been undertaken as a basic research for automatic pattern design for baseball uniforms manufactured under custom-MTM system, propose building up of a system whereby various partial patterns are combined under an automatic design system and develop a multi-combination type pattern searching algorithm which allows development of a various designs. As a result of this, type classification based on pattern design elements includes side, open, collar, facing and panel type. Design have been divided into coarse classification ranging from level 1 to 7 according to pattern design elements, based on a design distribution chart. Out of 7 such levels, 3 major types determining design which are, more specifically, level 1 sleeve type, level 2 open type and level 3 collar type, have been taken and combined to determine a total of 12 types to be used for design classification codes. Respective name of style and patterns have been coded using alphabet and numerals. Totally, pattern searching algorithm of multi-combination type has been developed whereby combination of patterns belonging to a specific style can be retrieved automatically once that style name is designated on the automatic pattern design system.

A Feature Vector Extraction Method For the Automatic Classification of Power Quality Disturbances (전력 외란 자동 식별을 위한 특징 벡터 추출 기법)

  • Lee, Chul-Ho;Lee, Jae-Sang;Cho, Kwan-Young;Chung, Ji-Hyun;Nam, Sang-Won
    • Proceedings of the KIEE Conference
    • /
    • 1996.11a
    • /
    • pp.404-406
    • /
    • 1996
  • The objective of this paper is to present a new feature-vector extraction method for the automatic detection and classification of power quality(PQ) disturbances, where FFT, DWT(Discrete Wavelet Transform), and data compression are utilized to extract an appropriate feature vector. In particular, the proposed classifier consists of three parts: i.e., (i) automatic detection of PQ disturbances, where the wavelet transform and signal power estimation method are utilized to detect each disturbance, (ii) feature vector extraction from the detected disturbance, and (iii) automatic classification, where Multi-Layer Perceptron(MLP) is used to classify each disturbance from the corresponding extracted feature vector. To demonstrate the performance and applicability of the proposed classification algorithm, some test results obtained by analyzing 7-class power quality disturbances generated by the EMTP are also provided.

  • PDF

Development of Deep Learning-based Automatic Classification of Architectural Objects in Point Clouds for BIM Application in Renovating Aging Buildings (딥러닝 기반 노후 건축물 리모델링 시 BIM 적용을 위한 포인트 클라우드의 건축 객체 자동 분류 기술 개발)

  • Kim, Tae-Hoon;Gu, Hyeong-Mo;Hong, Soon-Min;Choo, Seoung-Yeon
    • Journal of KIBIM
    • /
    • v.13 no.4
    • /
    • pp.96-105
    • /
    • 2023
  • This study focuses on developing a building object recognition technology for efficient use in the remodeling of buildings constructed without drawings. In the era of the 4th industrial revolution, smart technologies are being developed. This research contributes to the architectural field by introducing a deep learning-based method for automatic object classification and recognition, utilizing point cloud data. We use a TD3D network with voxels, optimizing its performance through adjustments in voxel size and number of blocks. This technology enables the classification of building objects such as walls, floors, and roofs from 3D scanning data, labeling them in polygonal forms to minimize boundary ambiguities. However, challenges in object boundary classifications were observed. The model facilitates the automatic classification of non-building objects, thereby reducing manual effort in data matching processes. It also distinguishes between elements to be demolished or retained during remodeling. The study minimized data set loss space by labeling using the extremities of the x, y, and z coordinates. The research aims to enhance the efficiency of building object classification and improve the quality of architectural plans by reducing manpower and time during remodeling. The study aligns with its goal of developing an efficient classification technology. Future work can extend to creating classified objects using parametric tools with polygon-labeled datasets, offering meaningful numerical analysis for remodeling processes. Continued research in this direction is anticipated to significantly advance the efficiency of building remodeling techniques.

A Study on Automatic Classification of Newspaper Articles Based on Unsupervised Learning by Departments (비지도학습 기반의 행정부서별 신문기사 자동분류 연구)

  • Kim, Hyun-Jong;Ryu, Seung-Eui;Lee, Chul-Ho;Nam, Kwang Woo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.21 no.9
    • /
    • pp.345-351
    • /
    • 2020
  • Administrative agencies today are paying keen attention to big data analysis to improve their policy responsiveness. Of all the big data, news articles can be used to understand public opinion regarding policy and policy issues. The amount of news output has increased rapidly because of the emergence of new online media outlets, which calls for the use of automated bots or automatic document classification tools. There are, however, limits to the automatic collection of news articles related to specific agencies or departments based on the existing news article categories and keyword search queries. Thus, this paper proposes a method to process articles using classification glossaries that take into account each agency's different work features. To this end, classification glossaries were developed by extracting the work features of different departments using Word2Vec and topic modeling techniques from news articles related to different agencies. As a result, the automatic classification of newspaper articles for each department yielded approximately 71% accuracy. This study is meaningful in making academic and practical contributions because it presents a method of extracting the work features for each department, and it is an unsupervised learning-based automatic classification method for automatically classifying news articles relevant to each agency.

Classification of e-mail Using Dynamic Category Hierarchy and Automatic category generation (자동 카테고리 생성과 동적 분류 체계를 사용한 이메일 분류)

  • Ahn Chan Min;Park Sang Ho;Lee Ju-Hong;Choi Bum-Ghi;Park Sun
    • Journal of Intelligence and Information Systems
    • /
    • v.10 no.2
    • /
    • pp.79-89
    • /
    • 2004
  • Since the amount of E-mail messages has increased , we need a new technique for efficient e-mail classification. E-mail classifications are grouped into two classes: binary classification, multi-classification. The current binary classification methods are mostly spm mail classification methods which are based on rule driven, bayesian, SVM, etc. The current multi- classification methods are based on clustering which groups e-mails by similarity. In this paper, we propose a novel method for e-mail classification. It combines the automatic category generation method based on the vector model and the dynamic category hierarchy construction method. This method can multi-classify e-mail automatically and manage a large amount of e-mail efficiently. In addition, this method increases the search accuracy by dynamic reclassification of e-mails.

  • PDF

An Analytical Study on Performance Factors of Automatic Classification based on Machine Learning (기계학습에 기초한 자동분류의 성능 요소에 관한 연구)

  • Kim, Pan Jun
    • Journal of the Korean Society for information Management
    • /
    • v.33 no.2
    • /
    • pp.33-59
    • /
    • 2016
  • This study examined the factors affecting the performance of automatic classification for the domestic conference papers based on machine learning techniques. In particular, In view of the classification performance that assigning automatically the class labels to the papers in Proceedings of the Conference of Korean Society for Information Management using Rocchio algorithm, I investigated the characteristics of the key factors (classifier formation methods, training set size, weighting schemes, label assigning methods) through the diversified experiments. Consequently, It is more effective that apply proper parameters (${\beta}$, ${\lambda}$) and training set size (more than 5 years) according to the classification environments and properties of the document set. and If the performance is equivalent, I discovered that the use of the more simple methods (single weighting schemes) is very efficient. Also, because the classification of domestic papers is corresponding with multi-label classification which assigning more than one label to an article, it is necessary to develop the optimum classification model based on the characteristics of the key factors in consideration of this environment.