• Title/Summary/Keyword: Database for Classification

Search Result 851, Processing Time 0.031 seconds

TEMPORAL CLASSIFICATION METHOD FOR FORECASTING LOAD PATTERNS FROM AMR DATA

  • Lee, Heon-Gyu;Shin, Jin-Ho;Ryu, Keun-Ho
    • Proceedings of the KSRS Conference
    • /
    • 2007.10a
    • /
    • pp.594-597
    • /
    • 2007
  • We present in this paper a novel mid and long term power load prediction method using temporal pattern mining from AMR (Automatic Meter Reading) data. Since the power load patterns have time-varying characteristic and very different patterns according to the hour, time, day and week and so on, it gives rise to the uninformative results if only traditional data mining is used. Also, research on data mining for analyzing electric load patterns focused on cluster analysis and classification methods. However despite the usefulness of rules that include temporal dimension and the fact that the AMR data has temporal attribute, the above methods were limited in static pattern extraction and did not consider temporal attributes. Therefore, we propose a new classification method for predicting power load patterns. The main tasks include clustering method and temporal classification method. Cluster analysis is used to create load pattern classes and the representative load profiles for each class. Next, the classification method uses representative load profiles to build a classifier able to assign different load patterns to the existing classes. The proposed classification method is the Calendar-based temporal mining and it discovers electric load patterns in multiple time granularities. Lastly, we show that the proposed method used AMR data and discovered more interest patterns.

  • PDF

Privacy Disclosure and Preservation in Learning with Multi-Relational Databases

  • Guo, Hongyu;Viktor, Herna L.;Paquet, Eric
    • Journal of Computing Science and Engineering
    • /
    • v.5 no.3
    • /
    • pp.183-196
    • /
    • 2011
  • There has recently been a surge of interest in relational database mining that aims to discover useful patterns across multiple interlinked database relations. It is crucial for a learning algorithm to explore the multiple inter-connected relations so that important attributes are not excluded when mining such relational repositories. However, from a data privacy perspective, it becomes difficult to identify all possible relationships between attributes from the different relations, considering a complex database schema. That is, seemingly harmless attributes may be linked to confidential information, leading to data leaks when building a model. Thus, we are at risk of disclosing unwanted knowledge when publishing the results of a data mining exercise. For instance, consider a financial database classification task to determine whether a loan is considered high risk. Suppose that we are aware that the database contains another confidential attribute, such as income level, that should not be divulged. One may thus choose to eliminate, or distort, the income level from the database to prevent potential privacy leakage. However, even after distortion, a learning model against the modified database may accurately determine the income level values. It follows that the database is still unsafe and may be compromised. This paper demonstrates this potential for privacy leakage in multi-relational classification and illustrates how such potential leaks may be detected. We propose a method to generate a ranked list of subschemas that maintains the predictive performance on the class attribute, while limiting the disclosure risk, and predictive accuracy, of confidential attributes. We illustrate and demonstrate the effectiveness of our method against a financial database and an insurance database.

Temporal Classification Method for Forecasting Power Load Patterns From AMR Data

  • Lee, Heon-Gyu;Shin, Jin-Ho;Park, Hong-Kyu;Kim, Young-Il;Lee, Bong-Jae;Ryu, Keun-Ho
    • Korean Journal of Remote Sensing
    • /
    • v.23 no.5
    • /
    • pp.393-400
    • /
    • 2007
  • We present in this paper a novel power load prediction method using temporal pattern mining from AMR(Automatic Meter Reading) data. Since the power load patterns have time-varying characteristic and very different patterns according to the hour, time, day and week and so on, it gives rise to the uninformative results if only traditional data mining is used. Also, research on data mining for analyzing electric load patterns focused on cluster analysis and classification methods. However despite the usefulness of rules that include temporal dimension and the fact that the AMR data has temporal attribute, the above methods were limited in static pattern extraction and did not consider temporal attributes. Therefore, we propose a new classification method for predicting power load patterns. The main tasks include clustering method and temporal classification method. Cluster analysis is used to create load pattern classes and the representative load profiles for each class. Next, the classification method uses representative load profiles to build a classifier able to assign different load patterns to the existing classes. The proposed classification method is the Calendar-based temporal mining and it discovers electric load patterns in multiple time granularities. Lastly, we show that the proposed method used AMR data and discovered more interest patterns.

A Study on the Face Recognition Using PCA

  • Lee Joon-Tark;Kueh Lee Hui
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2006.11a
    • /
    • pp.305-309
    • /
    • 2006
  • In this paper, a face recognition algorithm system using Principle Component Analysis is proposed. The algorithm recognized a person by comparing characteristics (features) of the face to those of known individuals which is a face database of Intelligence Control Laboratory(ICONL). Experiments were simulated in order to demonstrate the performance of this algorithm due to face recognition which presented for the classification of face and non-face and the classification of known and unknown.

  • PDF

GDAS and UNSPSC for the Distribution Industry (유통산업에 적용되는 GDAS와 UNSPSC 분류체계)

  • 이창수
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 2001.10a
    • /
    • pp.265-268
    • /
    • 2001
  • As growing the electronic commerce there are significant changes in the products/services catalog into the on-line environment. Advertent of e-catalog business opportunity for their own product/services enlarges the market volume and there are diverse methods for the presentation of its product/services. A method for the presentation of product/services features one uses identification and classification system. This study constructs a classification system and database layout for the product/services classification system as a part of e-catalog system. We consider the specific method for the GDAS-based dataset and UNSPSC classification system in the distribution industry.

  • PDF

The Classification of Electrocardiograph Arrhythmia Patterns using Fuzzy Support Vector Machines

  • Lee, Soo-Yong;Ahn, Deok-Yong;Song, Mi-Hae;Lee, Kyoung-Joung
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.11 no.3
    • /
    • pp.204-210
    • /
    • 2011
  • This paper proposes a fuzzy support vector machine ($FSVM_n$) pattern classifier to classify the arrhythmia patterns of an electrocardiograph (ECG). The $FSVM_n$ is a pattern classifier which combines n-dimensional fuzzy membership functions with a slack variable of SVM. To evaluate the performance of the proposed classifier, the MIT/BIH ECG database, which is a standard database for evaluating arrhythmia detection, was used. The pattern classification experiment showed that, when classifying ECG into four patterns - NSR, VT, VF, and NSR, VT, and VF classification rate resulted in 99.42%, 99.00%, and 99.79%, respectively. As a result, the $FSVM_n$ shows better pattern classification performance than the existing SVM and FSVM algorithms.

Music Genre Classification System Using Decorrelated Filter Bank (Decorrelated Filter Bank를 이용한 음악 장르 분류 시스템)

  • Lim, Shin-Cheol;Jang, Sei-Jin;Lee, Seok-Pil;Kim, Moo-Young
    • The Journal of the Acoustical Society of Korea
    • /
    • v.30 no.2
    • /
    • pp.100-106
    • /
    • 2011
  • Music recordings have been digitalized such that huge size of music database is available to the public. Thus, the automatic classification system of music genres is required to effectively manage the growing music database. Mel-Frequency Cepstral Coefficient (MFCC) is a popular feature vector for genre classification. In this paper, the combined super-vector with Decorrelated Filter Bank (DFB) and Octave-based Spectral Contrast (OSC) using texture windows is processed by Support Vector Machine (SVM) for genre classification. Even with the lower order of the feature vector, the proposed super-vector produces 4.2 % improved classification accuracy compared with the conventional Marsyas system.

Multi-granular Angle Description for Plant Leaf Classification and Retrieval Based on Quotient Space

  • Xu, Guoqing;Wu, Ran;Wang, Qi
    • Journal of Information Processing Systems
    • /
    • v.16 no.3
    • /
    • pp.663-676
    • /
    • 2020
  • Plant leaf classification is a significant application of image processing techniques in modern agriculture. In this paper, a multi-granular angle description method is proposed for plant leaf classification and retrieval. The proposed method can describe leaf information from coarse to fine using multi-granular angle features. In the proposed method, each leaf contour is partitioned first with equal arc length under different granularities. And then three kinds of angle features are derived under each granular partition of leaf contour: angle value, angle histogram, and angular ternary pattern. These multi-granular angle features can capture both local and globe information of the leaf contour, and make a comprehensive description. In leaf matching stage, the simple city block metric is used to compute the dissimilarity of each pair of leaf under different granularities. And the matching scores at different granularities are fused based on quotient space theory to obtain the final leaf similarity measurement. Plant leaf classification and retrieval experiments are conducted on two challenging leaf image databases: Swedish leaf database and Flavia leaf database. The experimental results and the comparison with state-of-the-art methods indicate that proposed method has promising classification and retrieval performance.

Designing a Classification System for Minhwa DB (민화 DB를 위한 분류체계 설계)

  • Choi, Eunjin;Lee, Young-Suk
    • Journal of Korea Multimedia Society
    • /
    • v.25 no.1
    • /
    • pp.135-143
    • /
    • 2022
  • In order to convert Korean folk paintings called Minhwa, a part of traditional Korean heritage, into DBs, it is necessary to design a classification system suitable for the characteristics of folk paintings. A classification system and the generating of unique codes are required to classify and save them. To realize this, a basic classification system was created by listing objects depicted in folk paintings, and keywords were extracted by reclassifying them for each object. In order to assign a unique code to each piece, we organize the English names of each Minhwa since the English names of the folk painting contain the names of objects. The code name is extracted by applying the order of nouns and consonant priority rules in English names and attaching five Arabic numerals. These codes are later assigned to each image file stored in the database and are input together with the keyword. The Minhwa DB constructed in this way enables storage and search centered on objects and keywords and the intuitive inferring of the type of object from the code name.

CANCER CLASSIFICATION AND PREDICTION USING MULTIVARIATE ANALYSIS

  • Shon, Ho-Sun;Lee, Heon-Gyu;Ryu, Keun-Ho
    • Proceedings of the KSRS Conference
    • /
    • v.2
    • /
    • pp.706-709
    • /
    • 2006
  • Cancer is one of the major causes of death; however, the survival rate can be increased if discovered at an early stage for timely treatment. According to the statistics of the World Health Organization of 2002, breast cancer was the most prevalent cancer for all cancers occurring in women worldwide, and it account for 16.8% of entire cancers inflicting Korean women today. In order to classify the type of breast cancer whether it is benign or malignant, this study was conducted with the use of the discriminant analysis and the decision tree of data mining with the breast cancer data disclosed on the web. The discriminant analysis is a statistical method to seek certain discriminant criteria and discriminant function to separate the population groups on the basis of observation values obtained from two or more population groups, and use the values obtained to allow the existing observation value to the population group thereto. The decision tree analyzes the record of data collected in the part to show it with the pattern existing in between them, namely, the combination of attribute for the characteristics of each class and make the classification model tree. Through this type of analysis, it may obtain the systematic information on the factors that cause the breast cancer in advance and prevent the risk of recurrence after the surgery.

  • PDF