• 제목/요약/키워드: Data Classification Systems

검색결과 1,439건 처리시간 0.029초

확장개체모델에서의 학습과 계층파악 (Learning and Classification in the Extensional Object Model)

  • 김용재;안준모;이석준
    • Asia pacific journal of information systems
    • /
    • 제17권1호
    • /
    • pp.33-58
    • /
    • 2007
  • Quiet often, an organization tries to grapple with inconsistent and partial information to generate relevant information to support decision making and action. As such, an organization scans the environment interprets scanned data, executes actions, and learns from feedback of actions, which boils down to computational interpretations and learning in terms of machine learning, statistics, and database. The ExOM proposed in this paper is geared to facilitate such knowledge discovery found in large databases in a most flexible manner. It supports a broad range of learning and classification styles and integrates them with traditional database functions. The learning and classification components of the ExOM are tightly integrated so that learning and classification of objects is less burdensome to ordinary users. A brief sketch of a strategy as to the expressiveness of terminological language is followed by a description of prototype implementation of the learning and classification components of the ExOM.

Online Selective-Sample Learning of Hidden Markov Models for Sequence Classification

  • Kim, Minyoung
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • 제15권3호
    • /
    • pp.145-152
    • /
    • 2015
  • We consider an online selective-sample learning problem for sequence classification, where the goal is to learn a predictive model using a stream of data samples whose class labels can be selectively queried by the algorithm. Given that there is a limit to the total number of queries permitted, the key issue is choosing the most informative and salient samples for their class labels to be queried. Recently, several aggressive selective-sample algorithms have been proposed under a linear model for static (non-sequential) binary classification. We extend the idea to hidden Markov models for multi-class sequence classification by introducing reasonable measures for the novelty and prediction confidence of the incoming sample with respect to the current model, on which the query decision is based. For several sequence classification datasets/tasks in online learning setups, we demonstrate the effectiveness of the proposed approach.

목표 속성을 고려한 연관규칙과 분류 기법 (Directed Association Rules Mining and Classification)

  • 한경록;김재련
    • 산업경영시스템학회지
    • /
    • 제24권63호
    • /
    • pp.23-31
    • /
    • 2001
  • Data mining can be either directed or undirected. One way of thinking about it is that we use undirected data mining to recognize relationship in the data and directed data mining to explain those relationships once they have been found. Several data mining techniques have received considerable research attention. In this paper, we propose an algorithm for discovering association rules as directed data mining and applying them to classification. In the first phase, we find frequent closed itemsets and association rules. After this phase, we construct the decision trees using discovered association rules. The algorithm can be applicable to customer relationship management.

  • PDF

Classification of Imbalanced Data Based on MTS-CBPSO Method: A Case Study of Financial Distress Prediction

  • Gu, Yuping;Cheng, Longsheng;Chang, Zhipeng
    • Journal of Information Processing Systems
    • /
    • 제15권3호
    • /
    • pp.682-693
    • /
    • 2019
  • The traditional classification methods mostly assume that the data for class distribution is balanced, while imbalanced data is widely found in the real world. So it is important to solve the problem of classification with imbalanced data. In Mahalanobis-Taguchi system (MTS) algorithm, data classification model is constructed with the reference space and measurement reference scale which is come from a single normal group, and thus it is suitable to handle the imbalanced data problem. In this paper, an improved method of MTS-CBPSO is constructed by introducing the chaotic mapping and binary particle swarm optimization algorithm instead of orthogonal array and signal-to-noise ratio (SNR) to select the valid variables, in which G-means, F-measure, dimensionality reduction are regarded as the classification optimization target. This proposed method is also applied to the financial distress prediction of Chinese listed companies. Compared with the traditional MTS and the common classification methods such as SVM, C4.5, k-NN, it is showed that the MTS-CBPSO method has better result of prediction accuracy and dimensionality reduction.

Hybrid Case-based Reasoning and Genetic Algorithms Approach for Customer Classification

  • Kim Kyoung-jae;Ahn Hyunchul
    • Journal of information and communication convergence engineering
    • /
    • 제3권4호
    • /
    • pp.209-212
    • /
    • 2005
  • This study proposes hybrid case-based reasoning and genetic algorithms model for customer classification. In this study, vertical and horizontal dimensions of the research data are reduced through integrated feature and instance selection process using genetic algorithms. We applied the proposed model to customer classification model which utilizes customers' demographic characteristics as inputs to predict their buying behavior for the specific product. Experimental results show that the proposed model may improve the classification accuracy and outperform various optimization models of typical CBR system.

국부 확률을 이용한 데이터 분류에 관한 연구 (A Study on Data Clustering Method Using Local Probability)

  • 손창호;최원호;이재국
    • 제어로봇시스템학회논문지
    • /
    • 제13권1호
    • /
    • pp.46-51
    • /
    • 2007
  • In this paper, we propose a new data clustering method using local probability and hypothesis theory. To cluster the test data set we analyze the local area of the test data set using local probability distribution and decide the candidate class of the data set using mean standard deviation and variance etc. To decide each class of the test data, statistical hypothesis theory is applied to the decided candidate class of the test data set. For evaluating, the proposed classification method is compared to the conventional fuzzy c-mean method, k-means algorithm and Discriminator analysis algorithm. The simulation results show more accuracy than results of fuzzy c-mean method, k-means algorithm and Discriminator analysis algorithm.

Convolutional Neural Network Model Using Data Augmentation for Emotion AI-based Recommendation Systems

  • Ho-yeon Park;Kyoung-jae Kim
    • 한국컴퓨터정보학회논문지
    • /
    • 제28권12호
    • /
    • pp.57-66
    • /
    • 2023
  • 본 연구에서는 딥러닝 기법과 정서적 AI를 적용하여 사용자의 감정 상태를 추정하고 이를 추천 과정에 반영할 수 있는 추천 시스템에 대한 새로운 연구 프레임워크를 제안한다. 이를 위해 분노, 혐오, 공포, 행복, 슬픔, 놀람, 중립의 7가지 감정을 각각 분류하는 감정분류모델을 구축하고, 이 결과를 추천 과정에 반영할 수 있는 모형을 제안한다. 그러나 일반적인 감정 분류 데이터에서는 각 레이블 간 분포 비율의 차이가 크기 때문에 일반화된 분류 결과를 기대하기 어려울 수 있다. 본 연구에서는 감정 이미지 데이터에서 혐오감 등의 감정 개수가 부족한 경우가 많으므로 데이터 증강을 이용한다. 마지막으로, 이미지 증강을 통해 데이터 기반의 감정 예측 모델을 추천시스템에 반영하는 방법을 제안한다.

원격탐사와 지리정보시스템간의 접목방법에 관한 고찰 (A Discussion on the Approaches for Interfacing Remote Sensing and Geographic Information Systems)

  • 정성학;김갑덕
    • 대한원격탐사학회지
    • /
    • 제8권2호
    • /
    • pp.125-130
    • /
    • 1992
  • 원격탐사와 지리정보시스템은 많은 분야에서 접목되어 활용되고 있다. 이러한 두 공간자 료처리시스템간에 자료의 이동방법에 관하여 두 가지 기법을 고찰하였다. 원격탐사자료를 이용하 여 자연자원을 정확하게 구분하는 데에는 어려움이 따른다. 그 정확도를 높이기 위해서는 보조자 료, 즉 디지타이즈된 지도 및 지형(고도)자료 등을 원격탐사자료와 결합하여 이용한다. 이러한 자 료를 이용하는 데에는 (1) 구분 전 층화와 (2)구분 후 정리 등의 두 가지 기법이 많이 쓰인다. 이 두 기법은 유용한 반면, 결정 규칙에 의존함으로써 다소 전문성이 결여된다.

Web-based synthetic-aperture radar data management system and land cover classification

  • Dalwon Jang;Jaewon Lee;Jong-Seol Lee
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제17권7호
    • /
    • pp.1858-1872
    • /
    • 2023
  • With the advance of radar technologies, the availability of synthetic aperture radar (SAR) images increases. To improve application of SAR images, a management system for SAR images is proposed in this paper. The system provides trainable land cover classification module and display of SAR images on the map. Users of the system can create their own classifier with their data, and obtain the classified results of newly captured SAR images by applying the classifier to the images. The classifier is based on convolutional neural network structure. Since there are differences among SAR images depending on capturing method and devices, a fixed classifier cannot cover all types of SAR land cover classification problems. Thus, it is adopted to create each user's classifier. In our experiments, it is shown that the module works well with two different SAR datasets. With this system, SAR data and land cover classification results are managed and easily displayed.

유전자 발현 데이터를 이용한 암의 유형 분류 기법 (Cancer-Subtype Classification Based on Gene Expression Data)

  • 조지훈;이동권;이민영;이인범
    • 제어로봇시스템학회논문지
    • /
    • 제10권12호
    • /
    • pp.1172-1180
    • /
    • 2004
  • Recently, the gene expression data, product of high-throughput technology, appeared in earnest and the studies related with it (so-called bioinformatics) occupied an important position in the field of biological and medical research. The microarray is a revolutionary technology which enables us to monitor several thousands of genes simultaneously and thus to gain an insight into the phenomena in the human body (e.g. the mechanism of cancer progression) at the molecular level. To obtain useful information from such gene expression measurements, it is essential to analyze the data with appropriate techniques. However the high-dimensionality of the data can bring about some problems such as curse of dimensionality and singularity problem of matrix computation, and hence makes it difficult to apply conventional data analysis methods. Therefore, the development of method which can effectively treat the data becomes a challenging issue in the field of computational biology. This research focuses on the gene selection and classification for cancer subtype discrimination based on gene expression (microarray) data.