• Title/Summary/Keyword: Data classification

Search Result 7,933, Processing Time 0.033 seconds

Classification Index and Grade Levels for Energy Efficiency Classification of Agricultural Heaters in Korea

  • Shin, Chang Seop;Jang, Ji Hoon;Kim, Young Tae;Kim, Kyeong Uk
    • Journal of Biosystems Engineering
    • /
    • v.38 no.4
    • /
    • pp.264-269
    • /
    • 2013
  • Purpose: This study was carried out to develop a classification index and grade levels to rate agricultural heaters for energy efficiency classification. Methods: The classification index was developed mainly by taking simplicity of calculation and easy access to relevant data into consideration. The grade levels were developed on the basis of a 5-grade classification system in which graded heaters are to be normally distributed over the grades. The value of each grade level were determined in terms of the classification index values calculated using the published performance data of agricultural heaters tested at the FACT in Korea over the past 12 years. Results: The thermal efficiency of agricultural heaters based on the enthalpy method was proposed as a reasonable classification index. The grade levels were proposed in equation form for three types of agricultural heaters: fossil fuel heaters, wood pellet heaters and wood pellet boilers. A reasonable energy efficiency classification of agricultural heaters could be performed using the proposed classification index and grade levels. Conclusions: It is expected that energy saving programs will be extended to agricultural machines in the near future. The classification index and grade levels to rate agricultural heaters for energy efficiency classification were developed and proposed for such near future to come.

Bands Classification of Multispectral Image Data using Indiscernibility Relations in Rough Sets (러프 집합에서의 식별 불능 관계를 이용한 다중 분광 이미지 데이터의 밴드 분류)

  • Won Sung-Hyun
    • Management & Information Systems Review
    • /
    • v.1
    • /
    • pp.401-412
    • /
    • 1997
  • Traditionally, classification of remote sensed image data is one of the important works for image data analysis procedure. So, many researchers have been devoted their endeavor to increasing accuracy of analysis, also, many classification algorithms have been proposed. In this paper, we propose new bands selection method for multispectral bands of remote sensed image data that use rough set theory. Using indiscernibility relations in rough sets, we show that can select the efficient bands of multispectral image data, automatically.

  • PDF

Demension reduction for high-dimensional data via mixtures of common factor analyzers-an application to tumor classification

  • Baek, Jang-Sun
    • Journal of the Korean Data and Information Science Society
    • /
    • v.19 no.3
    • /
    • pp.751-759
    • /
    • 2008
  • Mixtures of factor analyzers(MFA) is useful to model the distribution of high-dimensional data on much lower dimensional space where the number of observations is very large relative to their dimension. Mixtures of common factor analyzers(MCFA) can reduce further the number of parameters in the specification of the component covariance matrices as the number of classes is not small. Moreover, the factor scores of MCFA can be displayed in low-dimensional space to distinguish the groups. We propose the factor scores of MCFA as new low-dimensional features for classification of high-dimensional data. Compared with the conventional dimension reduction methods such as principal component analysis(PCA) and canonical covariates(CV), the proposed factor score was shown to have higher correct classification rates for three real data sets when it was used in parametric and nonparametric classifiers.

  • PDF

An MILP Approach to a Nonlinear Pattern Classification of Data (혼합정수 선형계획법 기반의 비선형 패턴 분류 기법)

  • Kim, Kwangsoo;Ryoo, Hong Seo
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.32 no.2
    • /
    • pp.74-81
    • /
    • 2006
  • In this paper, we deal with the separation of data by concurrently determined, piecewise nonlinear discriminant functions. Toward the end, we develop a new $l_1$-distance norm error metric and cast the problem as a mixed 0-1 integer and linear programming (MILP) model. Given a finite number of discriminant functions as an input, the proposed model considers the synergy as well as the individual role of the functions involved and implements a simplest nonlinear decision surface that best separates the data on hand. Hence, exploiting powerful MILP solvers, the model efficiently analyzes any given data set for its piecewise nonlinear separability. The classification of four sets of artificial data demonstrates the aforementioned strength of the proposed model. Classification results on five machine learning benchmark databases prove that the data separation via the proposed MILP model is an effective supervised learning methodology that compares quite favorably to well-established learning methodologies.

Classification Method of Congestion Change Type for Efficient Traffic Management (효율적인 교통관리를 위한 혼잡상황변화 유형 분류기법 개발)

  • Shim, Sangwoo;Lee, Hwanpil;Lee, Kyujin;Choi, Keechoo
    • International Journal of Highway Engineering
    • /
    • v.16 no.4
    • /
    • pp.127-134
    • /
    • 2014
  • PURPOSES : To operate more efficient traffic management system, it is utmost important to detect the change in congestion level on a freeway segment rapidly and reliably. This study aims to develop classification method of congestion change type. METHODS: This research proposes two classification methods to capture the change of the congestion level on freeway segments using the dedicated short range communication (DSRC) data and the vehicle detection system (VDS) data. For developing the classification methods, the decision tree models were employed in which the independent variable is the change in congestion level and the covariates are the DSRC and VDS data collected from the freeway segments in Korea. RESULTS : The comparison results show that the decision tree model with DSRC data are better than the decision tree model with VDS data. Specifically, the decision tree model using DSRC data with better fits show approximately 95% accuracies. CONCLUSIONS : It is expected that the congestion change type classified using the decision tree models could play an important role in future freeway traffic management strategy.

Color & Texture Attribute Classification System of Fashion Item Image for Standardizing Learning Data in Fashion AI (패션 AI의 학습 데이터 표준화를 위한 패션 아이템 이미지의 색채와 소재 속성 분류 체계)

  • Park, Nanghee;Choi, Yoonmi
    • Journal of the Korean Society of Clothing and Textiles
    • /
    • v.44 no.2
    • /
    • pp.354-368
    • /
    • 2020
  • Accurate and versatile image data-sets are essential for fashion AI research and AI-based fashion businesses based on a systematic attribute classification system. This study constructs a color and texture attribute hierarchical classification system by collecting fashion item images and analyzing the metadata of fashion items described by consumers. Essential dimensions to explain color and texture attributes were extracted; in addition, attribute values for each dimension were constructed based on metadata and previous studies. This hierarchical classification system satisfies consistency, exclusiveness, inclusiveness, and flexibility. The image tagging to confirm the usefulness of the proposed classification system indicated that the contents of attributes of the same image differ depending on the annotator that require a clear standard for distinguishing differences between the properties. This classification system will improve the reliability of the training data for machine learning, by providing standardized criteria for tasks such as tagging and annotating of fashion items.

Classification Analysis for Unbalanced Data (불균형 자료에 대한 분류분석)

  • Kim, Dongah;Kang, Suyeon;Song, Jongwoo
    • The Korean Journal of Applied Statistics
    • /
    • v.28 no.3
    • /
    • pp.495-509
    • /
    • 2015
  • We study a classification problem of significant differences in the proportion of two groups known as the unbalanced classification problem. It is usually more difficult to classify classes accurately in unbalanced data than balanced data. Most observations are likely to be classified to the bigger group if we apply classification methods to the unbalanced data because it can minimize the misclassification loss. However, this smaller group is misclassified as the larger group problem that can cause a bigger loss in most real applications. We compare several classification methods for the unbalanced data using sampling techniques (up and down sampling). We also check the total loss of different classification methods when the asymmetric loss is applied to simulated and real data. We use the misclassification rate, G-mean, ROC and AUC (area under the curve) for the performance comparison.

The Effect of Meta-Features of Multiclass Datasets on the Performance of Classification Algorithms (다중 클래스 데이터셋의 메타특징이 판별 알고리즘의 성능에 미치는 영향 연구)

  • Kim, Jeonghun;Kim, Min Yong;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.23-45
    • /
    • 2020
  • Big data is creating in a wide variety of fields such as medical care, manufacturing, logistics, sales site, SNS, and the dataset characteristics are also diverse. In order to secure the competitiveness of companies, it is necessary to improve decision-making capacity using a classification algorithm. However, most of them do not have sufficient knowledge on what kind of classification algorithm is appropriate for a specific problem area. In other words, determining which classification algorithm is appropriate depending on the characteristics of the dataset was has been a task that required expertise and effort. This is because the relationship between the characteristics of datasets (called meta-features) and the performance of classification algorithms has not been fully understood. Moreover, there has been little research on meta-features reflecting the characteristics of multi-class. Therefore, the purpose of this study is to empirically analyze whether meta-features of multi-class datasets have a significant effect on the performance of classification algorithms. In this study, meta-features of multi-class datasets were identified into two factors, (the data structure and the data complexity,) and seven representative meta-features were selected. Among those, we included the Herfindahl-Hirschman Index (HHI), originally a market concentration measurement index, in the meta-features to replace IR(Imbalanced Ratio). Also, we developed a new index called Reverse ReLU Silhouette Score into the meta-feature set. Among the UCI Machine Learning Repository data, six representative datasets (Balance Scale, PageBlocks, Car Evaluation, User Knowledge-Modeling, Wine Quality(red), Contraceptive Method Choice) were selected. The class of each dataset was classified by using the classification algorithms (KNN, Logistic Regression, Nave Bayes, Random Forest, and SVM) selected in the study. For each dataset, we applied 10-fold cross validation method. 10% to 100% oversampling method is applied for each fold and meta-features of the dataset is measured. The meta-features selected are HHI, Number of Classes, Number of Features, Entropy, Reverse ReLU Silhouette Score, Nonlinearity of Linear Classifier, Hub Score. F1-score was selected as the dependent variable. As a result, the results of this study showed that the six meta-features including Reverse ReLU Silhouette Score and HHI proposed in this study have a significant effect on the classification performance. (1) The meta-features HHI proposed in this study was significant in the classification performance. (2) The number of variables has a significant effect on the classification performance, unlike the number of classes, but it has a positive effect. (3) The number of classes has a negative effect on the performance of classification. (4) Entropy has a significant effect on the performance of classification. (5) The Reverse ReLU Silhouette Score also significantly affects the classification performance at a significant level of 0.01. (6) The nonlinearity of linear classifiers has a significant negative effect on classification performance. In addition, the results of the analysis by the classification algorithms were also consistent. In the regression analysis by classification algorithm, Naïve Bayes algorithm does not have a significant effect on the number of variables unlike other classification algorithms. This study has two theoretical contributions: (1) two new meta-features (HHI, Reverse ReLU Silhouette score) was proved to be significant. (2) The effects of data characteristics on the performance of classification were investigated using meta-features. The practical contribution points (1) can be utilized in the development of classification algorithm recommendation system according to the characteristics of datasets. (2) Many data scientists are often testing by adjusting the parameters of the algorithm to find the optimal algorithm for the situation because the characteristics of the data are different. In this process, excessive waste of resources occurs due to hardware, cost, time, and manpower. This study is expected to be useful for machine learning, data mining researchers, practitioners, and machine learning-based system developers. The composition of this study consists of introduction, related research, research model, experiment, conclusion and discussion.

러프집합과 계층적 분류구조를 이용한 데이터마이닝에서 분류지식발견

  • Lee, Chul-Heui;Seo, Seon-Hak
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.12 no.3
    • /
    • pp.202-209
    • /
    • 2002
  • This paper deals with simplification of classification rules for data mining and rule bases for control systems. Datamining that extracts useful information from such a large amount of data is one of important issues. There are various ways in classification methodologies for data mining such as the decision trees and neural networks, but the result should be explicit and understandable and the classification rules be short and clear. The rough sets theory is an effective technique in extracting knowledge from incomplete and inconsistent data and provides a good solution for classification and approximation by using various attributes effectively This paper investigates granularity of knowledge for reasoning of uncertain concopts by using rough set approximations and uses a hierarchical classification structure that is more effective technique for classification by applying core to upper level. The proposed classification methodology makes analysis of an information system eary and generates minimal classification rules.

Improving Classification Accuracy in Hierarchical Trees via Greedy Node Expansion

  • Byungjin Lim;Jong Wook Kim
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.6
    • /
    • pp.113-120
    • /
    • 2024
  • With the advancement of information and communication technology, we can easily generate various forms of data in our daily lives. To efficiently manage such a large amount of data, systematic classification into categories is essential. For effective search and navigation, data is organized into a tree-like hierarchical structure known as a category tree, which is commonly seen in news websites and Wikipedia. As a result, various techniques have been proposed to classify large volumes of documents into the terminal nodes of category trees. However, document classification methods using category trees face a problem: as the height of the tree increases, the number of terminal nodes multiplies exponentially, which increases the probability of misclassification and ultimately leads to a reduction in classification accuracy. Therefore, in this paper, we propose a new node expansion-based classification algorithm that satisfies the classification accuracy required by the application, while enabling detailed categorization. The proposed method uses a greedy approach to prioritize the expansion of nodes with high classification accuracy, thereby maximizing the overall classification accuracy of the category tree. Experimental results on real data show that the proposed technique provides improved performance over naive methods.