• Title/Summary/Keyword: data classification

Search Result 7,910, Processing Time 0.034 seconds

Finding a plan to improve recognition rate using classification analysis

  • Kim, SeungJae;Kim, SungHwan
    • International journal of advanced smart convergence
    • /
    • v.9 no.4
    • /
    • pp.184-191
    • /
    • 2020
  • With the emergence of the 4th Industrial Revolution, core technologies that will lead the 4th Industrial Revolution such as AI (artificial intelligence), big data, and Internet of Things (IOT) are also at the center of the topic of the general public. In particular, there is a growing trend of attempts to present future visions by discovering new models by using them for big data analysis based on data collected in a specific field, and inferring and predicting new values with the models. In order to obtain the reliability and sophistication of statistics as a result of big data analysis, it is necessary to analyze the meaning of each variable, the correlation between the variables, and multicollinearity. If the data is classified differently from the hypothesis test from the beginning, even if the analysis is performed well, unreliable results will be obtained. In other words, prior to big data analysis, it is necessary to ensure that data is well classified according to the purpose of analysis. Therefore, in this study, data is classified using a decision tree technique and a random forest technique among classification analysis, which is a machine learning technique that implements AI technology. And by evaluating the degree of classification of the data, we try to find a way to improve the classification and analysis rate of the data.

A Theoretical Study on Land Cover Classification - Focused on Natural Environment Management - (토지피복분류에 관한 이론적 연구 - 자연환경관리를 중심으로 -)

  • Jeon, Seong-Woo;Kim, Kwi-Gon;Park, Chong-Hwa;Lee, Dong-Kun
    • Journal of the Korean Society of Environmental Restoration Technology
    • /
    • v.2 no.1
    • /
    • pp.29-37
    • /
    • 1999
  • Land cover classification is an essential basic information in natural environment management; however, land cover classification studies in Korea have not yet been proceeded to a sufficient level. At the present, only a limited number of the precedent studies that only cover definite city area has been conducted. Furthermore, there is almost no research conducted on the land cover classification schemes that could accurately classify the Korea's land cover conditions. This study primarily focuses on the land cover classification scheme which carries the most urgent priority in order to classify and to map out the Korean land cover conditions. In order to develop the most suitable land cover classification scheme, many foreign land cover classification cases and projects that are being carried out were reviewed in depth. The land cover classification scheme this study proposes comprises 3 levels : The first level consists of 7 different classes; the second level consists of 22 different classes; and the third level is made up of 50 classes. The land cover classification map will serve many important roles in natural environment management, such as the conjecture of natural habitats and estimation of oxygen production or carbon dioxide absorption capability of a forest. In water pollution modelling, the land cover classification data can be used to estimate and locate non-point sources of water pollution. If applied to a watershed, modelling it will allow to estimate the total amount of pollution from non-point sources of pollution in the water shed. The land cover classification data will also be good as a barometer data that determines defusion of air pollutants in air pollution modelling.

  • PDF

The Classifications using by the Merged Imagery from SPOT and LANDSAT

  • Kang, In-Joon;Choi, Hyun;Kim, Hong-Tae;Lee, Jun-Seok;Choi, Chul-Ung
    • Proceedings of the KSRS Conference
    • /
    • 1999.11a
    • /
    • pp.262-266
    • /
    • 1999
  • Several commercial companies that plan to provide improved panchromatic and/or multi-spectral remote sensor data in the near future are suggesting that merge datasets will be of significant value. This study evaluated the utility of one major merging process-process components analysis and its inverse. The 6 bands of 30$\times$30m Landsat TM data and the 10$\times$l0m SPOT panchromatic data were used to create a new 10$\times$10m merged data file. For the image classification, 6 bands that is 1st, 2nd, 3rd, 4th, 5th and 7th band may be used in conjunction with supervised classification algorithms except band 6. One of the 7 bands is Band 6 that records thermal IR energy and is rarely used because of its coarse spatial resolution (120m) except being employed in thermal mapping. Because SPOT panchromatic has high resolution it makes 10$\times$10m SPOT panchromatic data be used to classify for the detailed classification. SPOT as the Landsat has acquired hundreds of thousands of images in digital format that are commercially available and are used by scientists in different fields. After the merged, the classifications used supervised classification and neural network. The method of the supervised classification is what used parallelepiped and/or minimum distance and MLC(Maximum Likelihood Classification) The back-propagation in the multi-layer perception is one of the neural network. The used method in this paper is MLC(Maximum Likelihood Classification) of the supervised classification and the back-propagation of the neural network. Later in this research SPOT systems and images are compared with these classification. A comparative analysis of the classifications from the TM and merged SPOT/TM datasets will be resulted in some conclusions.

  • PDF

Data-Adaptive ECOC for Multicategory Classification

  • Seok, Kyung-Ha
    • Journal of the Korean Data and Information Science Society
    • /
    • v.19 no.1
    • /
    • pp.25-36
    • /
    • 2008
  • Error Correcting Output Codes (ECOC) can improve generalization performance when applied to multicategory classification problem. In this study we propose a new criterion to select hyperparameters included in ECOC scheme. Instead of margins of a data we propose to use the probability of misclassification error since it makes the criterion simple. Using this we obtain an upper bound of leave-one-out error of OVA(one vs all) method. Our experiments from real and synthetic data indicate that the bound leads to good estimates of parameters.

  • PDF

Classification of Korean Characters and Frequency of Continual Characters (한국자소의 분류와 연속 상관빈도)

  • Kim, Guk;Jeong, Byeong-Yong
    • Journal of the Ergonomics Society of Korea
    • /
    • v.21 no.2
    • /
    • pp.1-11
    • /
    • 2002
  • Classification of Korean characters(alphabets) and frequency data of them are studied that is essential to information process of Korean. We defined a classification of characters using the concept of 'set of 2 parts' and 'set of 3 parts', and we researched frequencies about all combinations of continual two characters. These data would be important basic data to design input device of computer, for example.

New Splitting Criteria for Classification Trees

  • Lee, Yung-Seop
    • Communications for Statistical Applications and Methods
    • /
    • v.8 no.3
    • /
    • pp.885-894
    • /
    • 2001
  • Decision tree methods is the one of data mining techniques. Classification trees are used to predict a class label. When a tree grows, the conventional splitting criteria use the weighted average of the left and the right child nodes for measuring the node impurity. In this paper, new splitting criteria for classification trees are proposed which improve the interpretablity of trees comparing to the conventional methods. The criteria search only for interesting subsets of the data, as opposed to modeling all of the data equally well. As a result, the tree is very unbalanced but extremely interpretable.

  • PDF

A Comparison on Independent Component Analysis and Principal Component Analysis -for Classification Analysis-

  • Kim, Dae-Hak;Lee, Ki-Lak
    • Journal of the Korean Data and Information Science Society
    • /
    • v.16 no.4
    • /
    • pp.717-724
    • /
    • 2005
  • We often extract a new feature from the original features for the purpose of reducing the dimensions of feature space and better classification. In this paper, we show feature extraction method based on independent component analysis can be used for classification. Entropy and mutual information are used for the selection of ordered features. Performance of classification based on independent component analysis is compared with principal component analysis for three real data sets.

  • PDF

TEMPORAL CLASSIFICATION METHOD FOR FORECASTING LOAD PATTERNS FROM AMR DATA

  • Lee, Heon-Gyu;Shin, Jin-Ho;Ryu, Keun-Ho
    • Proceedings of the KSRS Conference
    • /
    • 2007.10a
    • /
    • pp.594-597
    • /
    • 2007
  • We present in this paper a novel mid and long term power load prediction method using temporal pattern mining from AMR (Automatic Meter Reading) data. Since the power load patterns have time-varying characteristic and very different patterns according to the hour, time, day and week and so on, it gives rise to the uninformative results if only traditional data mining is used. Also, research on data mining for analyzing electric load patterns focused on cluster analysis and classification methods. However despite the usefulness of rules that include temporal dimension and the fact that the AMR data has temporal attribute, the above methods were limited in static pattern extraction and did not consider temporal attributes. Therefore, we propose a new classification method for predicting power load patterns. The main tasks include clustering method and temporal classification method. Cluster analysis is used to create load pattern classes and the representative load profiles for each class. Next, the classification method uses representative load profiles to build a classifier able to assign different load patterns to the existing classes. The proposed classification method is the Calendar-based temporal mining and it discovers electric load patterns in multiple time granularities. Lastly, we show that the proposed method used AMR data and discovered more interest patterns.

  • PDF

Classification of Imbalanced Data Based on MTS-CBPSO Method: A Case Study of Financial Distress Prediction

  • Gu, Yuping;Cheng, Longsheng;Chang, Zhipeng
    • Journal of Information Processing Systems
    • /
    • v.15 no.3
    • /
    • pp.682-693
    • /
    • 2019
  • The traditional classification methods mostly assume that the data for class distribution is balanced, while imbalanced data is widely found in the real world. So it is important to solve the problem of classification with imbalanced data. In Mahalanobis-Taguchi system (MTS) algorithm, data classification model is constructed with the reference space and measurement reference scale which is come from a single normal group, and thus it is suitable to handle the imbalanced data problem. In this paper, an improved method of MTS-CBPSO is constructed by introducing the chaotic mapping and binary particle swarm optimization algorithm instead of orthogonal array and signal-to-noise ratio (SNR) to select the valid variables, in which G-means, F-measure, dimensionality reduction are regarded as the classification optimization target. This proposed method is also applied to the financial distress prediction of Chinese listed companies. Compared with the traditional MTS and the common classification methods such as SVM, C4.5, k-NN, it is showed that the MTS-CBPSO method has better result of prediction accuracy and dimensionality reduction.

Design of Distributed Processing Framework Based on H-RTGL One-class Classifier for Big Data (빅데이터를 위한 H-RTGL 기반 단일 분류기 분산 처리 프레임워크 설계)

  • Kim, Do Gyun;Choi, Jin Young
    • Journal of Korean Society for Quality Management
    • /
    • v.48 no.4
    • /
    • pp.553-566
    • /
    • 2020
  • Purpose: The purpose of this study was to design a framework for generating one-class classification algorithm based on Hyper-Rectangle(H-RTGL) in a distributed environment connected by network. Methods: At first, we devised one-class classifier based on H-RTGL which can be performed by distributed computing nodes considering model and data parallelism. Then, we also designed facilitating components for execution of distributed processing. In the end, we validate both effectiveness and efficiency of the classifier obtained from the proposed framework by a numerical experiment using data set obtained from UCI machine learning repository. Results: We designed distributed processing framework capable of one-class classification based on H-RTGL in distributed environment consisting of physically separated computing nodes. It includes components for implementation of model and data parallelism, which enables distributed generation of classifier. From a numerical experiment, we could observe that there was no significant change of classification performance assessed by statistical test and elapsed time was reduced due to application of distributed processing in dataset with considerable size. Conclusion: Based on such result, we can conclude that application of distributed processing for generating classifier can preserve classification performance and it can improve the efficiency of classification algorithms. In addition, we suggested an idea for future research directions of this paper as well as limitation of our work.