• 제목/요약/키워드: Classification Attributes

검색결과 303건 처리시간 0.031초

국부적 영역에서의 특징 공간 속성을 이용한 다중 인식기 선택 (Classifier Selection using Feature Space Attributes in Local Region)

  • 신동국;송혜정;김백섭
    • 한국정보과학회논문지:소프트웨어및응용
    • /
    • 제31권12호
    • /
    • pp.1684-1690
    • /
    • 2004
  • 본 논문은 시험 표본 주위의 영역에 대한 속성을 이용한 다중 인식기 선택 방법을 제안한다. 기존의 DCS-LA 동적 인식기 선택 방법은 시험 표본 주위의 학습표본들을 사용해서 각 인식기의 국부적 정확성을 계산하여 인식기를 동적으로 선택하기 때문에 인식 시간이 오래 걸린다. 본 논문에서는 특징공간에서 국부적인 속성을 계산해서 그 속성값에 적합한 인식기를 미리 선정해서 저장해 놓은 후 시험 표본이 들어오면 그 주변의 속성값에 따라 저장된 인식기에서 선택을 하기 때문에 인식시간을 줄일 수 있다. 국부적인 속성으로는 표본 주위의 작은 영역에 대한 엔트로피와 밀도를 계산하여 사용하였으며 이들을 특징공간속성(Feature Space Attribute)라고 하였다. 이들 두 속성으로 이루어지는 속성 공간을 규칙적인 사각형 셀로 나누어, 학습과정에서 각각의 학습표본에 대해 계산된 속성값이 어떤 셀에 속하는지를 구한다. 또한 각 셀에 속하는 학습표본들에 대해 각 인식기의 국부적 정확도를 구하여 셀에 저장한다. 시험 과정에서 시험표본에 대해 속성값 계산을 통해 그 표본이 속하는 셀을 구한 후 그 셀에서 국부적 정확도가 가장 높은 인식기로 인식한다. Elena 데이타베이스를 사용해서 기존의 방법과 제안된 방법을 비교하였다. 제안된 방법은 기존의 DCS-LA와 거의 같은 인식률을 나타내지만 인식속도는 약 4배 가까이 빨라짐을 실험을 통해 확인할 수 있었다.

Privacy Disclosure and Preservation in Learning with Multi-Relational Databases

  • Guo, Hongyu;Viktor, Herna L.;Paquet, Eric
    • Journal of Computing Science and Engineering
    • /
    • 제5권3호
    • /
    • pp.183-196
    • /
    • 2011
  • There has recently been a surge of interest in relational database mining that aims to discover useful patterns across multiple interlinked database relations. It is crucial for a learning algorithm to explore the multiple inter-connected relations so that important attributes are not excluded when mining such relational repositories. However, from a data privacy perspective, it becomes difficult to identify all possible relationships between attributes from the different relations, considering a complex database schema. That is, seemingly harmless attributes may be linked to confidential information, leading to data leaks when building a model. Thus, we are at risk of disclosing unwanted knowledge when publishing the results of a data mining exercise. For instance, consider a financial database classification task to determine whether a loan is considered high risk. Suppose that we are aware that the database contains another confidential attribute, such as income level, that should not be divulged. One may thus choose to eliminate, or distort, the income level from the database to prevent potential privacy leakage. However, even after distortion, a learning model against the modified database may accurately determine the income level values. It follows that the database is still unsafe and may be compromised. This paper demonstrates this potential for privacy leakage in multi-relational classification and illustrates how such potential leaks may be detected. We propose a method to generate a ranked list of subschemas that maintains the predictive performance on the class attribute, while limiting the disclosure risk, and predictive accuracy, of confidential attributes. We illustrate and demonstrate the effectiveness of our method against a financial database and an insurance database.

A Comparative Study of Medical Data Classification Methods Based on Decision Tree and System Reconstruction Analysis

  • Tang, Tzung-I;Zheng, Gang;Huang, Yalou;Shu, Guangfu;Wang, Pengtao
    • Industrial Engineering and Management Systems
    • /
    • 제4권1호
    • /
    • pp.102-108
    • /
    • 2005
  • This paper studies medical data classification methods, comparing decision tree and system reconstruction analysis as applied to heart disease medical data mining. The data we study is collected from patients with coronary heart disease. It has 1,723 records of 71 attributes each. We use the system-reconstruction method to weight it. We use decision tree algorithms, such as induction of decision trees (ID3), classification and regression tree (C4.5), classification and regression tree (CART), Chi-square automatic interaction detector (CHAID), and exhausted CHAID. We use the results to compare the correction rate, leaf number, and tree depth of different decision-tree algorithms. According to the experiments, we know that weighted data can improve the correction rate of coronary heart disease data but has little effect on the tree depth and leaf number.

데이터 표준화를 위한 패션 감성 분류 체계 (Classification System of Fashion Emotion for the Standardization of Data)

  • 박낭희;최윤미
    • 한국의류학회지
    • /
    • 제45권6호
    • /
    • pp.949-964
    • /
    • 2021
  • Accumulation of high-quality data is crucial for AI learning. The goal of using AI in fashion service is to propose of a creative, personalized solution that is close to the know-how of a human operator. These customized solutions require an understanding of fashion products and emotions. Therefore, it is necessary to accumulate data on the attributes of fashion products and fashion emotion. The first step for accumulating fashion data is to standardize the attribute with coherent system. The purpose of this study is to propose a fashion emotional classification system. For this, images of fashion products were collected, and metadata was obtained by allowing consumers to describe their emotions about fashion images freely. An emotional classification system with a hierarchical structure, was then constructed by performing frequency and CONCOR analyses on metadata. A final classification system was proposed by supplementing attribute values with reference to findings from previous studies and SNS data.

Advancements in Unmanned Aerial Vehicle Classification, Tracking, and Detection Algorithms

  • Ahmed Abdulhakim Al-Absi
    • International journal of advanced smart convergence
    • /
    • 제12권3호
    • /
    • pp.32-39
    • /
    • 2023
  • This paper provides a comprehensive overview of UAV classification, tracking, and detection, offering researchers a clear understanding of these fundamental concepts. It elucidates how classification categorizes UAVs based on attributes, how tracking monitors real-time positions, and how detection identifies UAV presence. The interconnectedness of these aspects is highlighted, with detection enhancing tracking and classification aiding in anomaly identification. Moreover, the paper emphasizes the relevance of simulations in the context of drones and UAVs, underscoring their pivotal role in training, testing, and research. By succinctly presenting these core concepts and their practical implications, the paper equips researchers with a solid foundation to comprehend and explore the complexities of UAV operations and the role of simulations in advancing this dynamic field.

Interpolation on data with multiple attributes by a neural network

  • Azumi, Hiroshi;Hiraoka, Kazuyuki;Mishima, Taketoshi
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2002년도 ITC-CSCC -2
    • /
    • pp.814-817
    • /
    • 2002
  • High-dimensional data with two or more attributes are considered. A typical example of such data is face images of various individuals and expressions. In these cases, collecting a complete data set is often difficult since the number of combinations can be large. In the present study, we propose a method to interpolate data of missing combinations from other data. If this becomes possible, robust recognition of multiple attributes is expectable. The key of this subject is appropriate extraction of the similarity that the face images of same individual or same expression have. Bilinear model [1]has been proposed as a solution of this subjcet. However, experiments on application of bilinear model to classification of face images resulted in low performance [2]. In order to overcome the limit of bilinear model, in this research, a nonlinear model on a neural network is adopted and usefulness of this model is experimentally confirmed.

  • PDF

Hybrid Feature Selection Method Based on Genetic Algorithm for the Diagnosis of Coronary Heart Disease

  • Wiharto, Wiharto;Suryani, Esti;Setyawan, Sigit;Putra, Bintang PE
    • Journal of information and communication convergence engineering
    • /
    • 제20권1호
    • /
    • pp.31-40
    • /
    • 2022
  • Coronary heart disease (CHD) is a comorbidity of COVID-19; therefore, routine early diagnosis is crucial. A large number of examination attributes in the context of diagnosing CHD is a distinct obstacle during the pandemic when the number of health service users is significant. The development of a precise machine learning model for diagnosis with a minimum number of examination attributes can allow examinations and healthcare actions to be undertaken quickly. This study proposes a CHD diagnosis model based on feature selection, data balancing, and ensemble-based classification methods. In the feature selection stage, a hybrid SVM-GA combined with fast correlation-based filter (FCBF) is used. The proposed system achieved an accuracy of 94.60% and area under the curve (AUC) of 97.5% when tested on the z-Alizadeh Sani dataset and used only 8 of 54 inspection attributes. In terms of performance, the proposed model can be placed in the very good category.

실적공사비 적산방식 도입을 위한 조경공사 공종분류체계에 관한 연구 -주택단지 조경공사를 중심으로- (A Study of Landscape Construction Work Classification for System Instruction of New Estimation System based on Historical Construction data. - With regard to Housing Landscape Construction -)

  • 박원규;김두하;안동만
    • 한국조경학회지
    • /
    • 제25권1호
    • /
    • pp.82-99
    • /
    • 1997
  • The purpose of this study is to establish work classification system of landscape construction in order to offer the basis of new estimation system of public landscape construction. New estimation system is based on historical construction data. For application of this system, the standard work classification system is necessary. Because extensive cost data should be accumulated under an unified construction work classification system. In the study of new estimation system carried by KICT(Korea Institute of Construction Technology), landscaping works belong to earth work of civil engineering. It looks very unreasonable work classification, because landscape archtecture has its own specialties and professional domain. In this study, information classification systems in the construction industry and various landscaping works of housing developments are analysed. As a result. a standard work classification system of housing landscape construction is proposed in section VI-3. This standard work classification structure consists of three levels divisions (i.e large work division, middle work division, small work division) . Now in this study, housing landscape construction works are divided into four large works and twenty six middle works. According to work attributes, middle and small work division is possible to subdivide into details.

  • PDF

러프셋 이론과 개체 관계 비교를 통한 의사결정나무 구성 (A New Decision Tree Algorithm Based on Rough Set and Entity Relationship)

  • 한상욱;김재련
    • 대한산업공학회지
    • /
    • 제33권2호
    • /
    • pp.183-190
    • /
    • 2007
  • We present a new decision tree classification algorithm using rough set theory that can induce classification rules, the construction of which is based on core attributes and relationship between objects. Although decision trees have been widely used in machine learning and artificial intelligence, little research has focused on improving classification quality. We propose a new decision tree construction algorithm that can be simplified and provides an improved classification quality. We also compare the new algorithm with the ID3 algorithm in terms of the number of rules.

An Application of the Rough Set Approach to credit Rating

  • Kim, Jae-Kyeong;Cho, Sung-Sik
    • 한국지능정보시스템학회:학술대회논문집
    • /
    • 한국지능정보시스템학회 1999년도 추계학술대회-지능형 정보기술과 미래조직 Information Technology and Future Organization
    • /
    • pp.347-354
    • /
    • 1999
  • The credit rating represents an assessment of the relative level of risk associated with the timely payments required by the debt obligation. In this paper, we present a new approach to credit rating of customers based on the rough set theory. The concept of a rough set appeared to be an effective tool for the analysis of customer information systems representing knowledge gained by experience. The customer information system describes a set of customers by a set of multi-valued attributes, called condition attributes. The customers are classified into groups of risk subject to an expert's opinion, called decision attribute. A natural problem of knowledge analysis consists then in discovering relationships, in terms of decision rules, between description of customers by condition attributes and particular decisions. The rough set approach enables one to discover minimal subsets of condition attributes ensuring an acceptable quality of classification of the customers analyzed and to derive decision rules from the customer information system which can be used to support decisions about rating new customers. Using the rough set approach one analyses only facts hidden in data, it does not need any additional information about data and does not correct inconsistencies manifested in data; instead, rules produced are categorized into certain and possible. A real problem of the evaluation of the evaluation of credit rating by a department store is studied using the rough set approach.

  • PDF