• Title/Summary/Keyword: Data Classification Systems

Search Result 1,424, Processing Time 0.032 seconds

A Study on the Improvement Directions of Data Classification Format for Efficient Information Management System (효율적인 정보화경영을 위한 데이터분류체계의 개선방안에 관한 연구)

  • Park, Jae-Yong
    • International Commerce and Information Review
    • /
    • v.6 no.3
    • /
    • pp.41-61
    • /
    • 2004
  • Today, most companies are needed to become interested on e-Biz and information management system. Especially, Data classification format system was very important for application to effective and efficiency management decision support. They should include main entry which consists of department, employee's name, title, publication date. Now, each company is using eleven different methods on data classification format system. In this paper finding result was as follows, in other words, general management document case using the nine date classification methods and special report management document ca se using the twodata classification methods. The aim of this study is to investigate problems that the present data classification format system has and some concerns that should be taken into account in case of the modification of the data classification system and change into a new one. This study is based on the survey in that the company managergave to 35 companies throughout the nation. As a result, the survey indicates that the crucial concerns of the participating managers are ineffective management information source and the duplication of data classification systems. This paper is the transcendental study the introduction of data classification format systems to business companies in Korea. This paper provided the fundamental data for the effective business process reengineering in business activity for management information.

  • PDF

Comparison of Hyperspectral and Multispectral Sensor Data for Land Use Classification

  • Kim, Dae-Sung;Han, Dong-Yeob;Yun, Ki;Kim, Yong-Il
    • Proceedings of the KSRS Conference
    • /
    • 2002.10a
    • /
    • pp.388-393
    • /
    • 2002
  • Remote sensing data is collected and analyzed to enhance understanding of the terrestrial surface. Since Landsat satellite was launched in 1972, many researches using multispectral data has been achieved. Recently, with the availability of airborne and satellite hyperspectral data, the study on hyperspectral data are being increased. It is known that as the number of spectral bands of high-spectral resolution data increases, the ability to detect more detailed cases should also increase, and the classification accuracy should increase as well. In this paper, we classified the hyperspectral and multispectral data and tested the classification accuracy. The MASTER(MODIS/ASTER Airborne Simulator, 50channels, 0.4~13$\mu$m) and Landsat TM(7channels) imagery including Yeong-Gwang area were used and we adjusted the classification items in several cases and tested their classification accuracy through statistical comparison. As a result of this study, it is shown that hyperspectral data offer more information than multispectral data.

  • PDF

A Preliminary Study on Clinical Decision Support System based on Classification Learning of Electronic Medical Records

  • Shin, Yang-Kyu
    • Journal of the Korean Data and Information Science Society
    • /
    • v.14 no.4
    • /
    • pp.817-824
    • /
    • 2003
  • We employed a hierarchical document classification method to classify a massive collection of electronic medical records(EMR) written in both Korean and English. Our experimental system has been learned from 5,000 records of EMR text data and predicted a newly given set of EMR text data over 68% correctly. We expect the accuracy rate can be improved greatly provided a dictionary of medical terms or a suitable medical thesaurus. The classification system might play a key role in some clinical decision support systems and various interpretation systems for clinical data.

  • PDF

Classification of Multi Spectral Image Data using Rough Sets (러프 집합을 이용한 다중 분광 이미지 데이터의 분류)

  • 원성현;이병성;정환묵
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 1997.11a
    • /
    • pp.205-208
    • /
    • 1997
  • Traditionally, classification of remote sensed image data is one of the important works for image data analysis procedure. So, many researchers devote their endeavor to increasing accuracy of analysis, also, many classification algorithms have been proposed. In this paper, we propose new classification method for remote sensed image data that use rough set theory. Using indiscernibility relation of rough sets, we show that can classify image data very easily.

  • PDF

A Comparison Study of Classification Algorithms in Data Mining

  • Lee, Seung-Joo;Jun, Sung-Rae
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.8 no.1
    • /
    • pp.1-5
    • /
    • 2008
  • Generally the analytical tools of data mining have two learning types which are supervised and unsupervised learning algorithms. Classification and prediction are main analysis tools for supervised learning. In this paper, we perform a comparison study of classification algorithms in data mining. We make comparative studies between popular classification algorithms which are LDA, QDA, kernel method, K-nearest neighbor, naive Bayesian, SVM, and CART. Also, we use almost all classification data sets of UCI machine learning repository for our experiments. According to our results, we are able to select proper algorithms for given classification data sets.

Genetic Algorithm Application to Machine Learning

  • Han, Myung-mook;Lee, Yill-byung
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.11 no.7
    • /
    • pp.633-640
    • /
    • 2001
  • In this paper we examine the machine learning issues raised by the domain of the Intrusion Detection Systems(IDS), which have difficulty successfully classifying intruders. There systems also require a significant amount of computational overhead making it difficult to create robust real-time IDS. Machine learning techniques can reduce the human effort required to build these systems and can improve their performance. Genetic algorithms are used to improve the performance of search problems, while data mining has been used for data analysis. Data Mining is the exploration and analysis of large quantities of data to discover meaningful patterns and rules. Among the tasks for data mining, we concentrate the classification task. Since classification is the basic element of human way of thinking, it is a well-studied problem in a wide variety of application. In this paper, we propose a classifier system based on genetic algorithm, and the proposed system is evaluated by applying it to IDS problem related to classification task in data mining. We report our experiments in using these method on KDD audit data.

  • PDF

Text Classification with Heterogeneous Data Using Multiple Self-Training Classifiers

  • William Xiu Shun Wong;Donghoon Lee;Namgyu Kim
    • Asia pacific journal of information systems
    • /
    • v.29 no.4
    • /
    • pp.789-816
    • /
    • 2019
  • Text classification is a challenging task, especially when dealing with a huge amount of text data. The performance of a classification model can be varied depending on what type of words contained in the document corpus and what type of features generated for classification. Aside from proposing a new modified version of the existing algorithm or creating a new algorithm, we attempt to modify the use of data. The classifier performance is usually affected by the quality of learning data as the classifier is built based on these training data. We assume that the data from different domains might have different characteristics of noise, which can be utilized in the process of learning the classifier. Therefore, we attempt to enhance the robustness of the classifier by injecting the heterogeneous data artificially into the learning process in order to improve the classification accuracy. Semi-supervised approach was applied for utilizing the heterogeneous data in the process of learning the document classifier. However, the performance of document classifier might be degraded by the unlabeled data. Therefore, we further proposed an algorithm to extract only the documents that contribute to the accuracy improvement of the classifier.

Development of e-Mail Classifiers for e-Mail Response Management Systems (전자메일 자동관리 시스템을 위한 전자메일 분류기의 개발)

  • Kim, Kuk-Pyo;Kwon, Young-S.
    • Journal of Information Technology Services
    • /
    • v.2 no.2
    • /
    • pp.87-95
    • /
    • 2003
  • With the increasing proliferation of World Wide Web, electronic mail systems have become very widely used communication tools. Researches on e-mail classification have been very important in that e-mail classification system is a major engine for e-mail response management systems which mine unstructured e-mail messages and automatically categorize them. in this research we develop e-mail classifiers for e-mail Response Management Systems (ERMS) using naive bayesian learning and centroid-based classification. We analyze which method performs better under which conditions, comparing classification accuracies which may depend on the structure, the size of training data set and number of classes, using the different data set of an on-line shopping mall and a credit card company. The developed e-mail classifiers have been successfully implemented in practice. The experimental results show that naive bayesian learning performs better, while centroid-based classification is more robust in terms of classification accuracy.

A Comparative Study of Medical Data Classification Methods Based on Decision Tree and System Reconstruction Analysis

  • Tang, Tzung-I;Zheng, Gang;Huang, Yalou;Shu, Guangfu;Wang, Pengtao
    • Industrial Engineering and Management Systems
    • /
    • v.4 no.1
    • /
    • pp.102-108
    • /
    • 2005
  • This paper studies medical data classification methods, comparing decision tree and system reconstruction analysis as applied to heart disease medical data mining. The data we study is collected from patients with coronary heart disease. It has 1,723 records of 71 attributes each. We use the system-reconstruction method to weight it. We use decision tree algorithms, such as induction of decision trees (ID3), classification and regression tree (C4.5), classification and regression tree (CART), Chi-square automatic interaction detector (CHAID), and exhausted CHAID. We use the results to compare the correction rate, leaf number, and tree depth of different decision-tree algorithms. According to the experiments, we know that weighted data can improve the correction rate of coronary heart disease data but has little effect on the tree depth and leaf number.

Bands Classification of Multispectral Image Data using Indiscernibility Relations in Rough Sets (러프 집합에서의 식별 불능 관계를 이용한 다중 분광 이미지 데이터의 밴드 분류)

  • Won Sung-Hyun
    • Management & Information Systems Review
    • /
    • v.1
    • /
    • pp.401-412
    • /
    • 1997
  • Traditionally, classification of remote sensed image data is one of the important works for image data analysis procedure. So, many researchers have been devoted their endeavor to increasing accuracy of analysis, also, many classification algorithms have been proposed. In this paper, we propose new bands selection method for multispectral bands of remote sensed image data that use rough set theory. Using indiscernibility relations in rough sets, we show that can select the efficient bands of multispectral image data, automatically.

  • PDF