• Title/Summary/Keyword: 베이지안 분류

Search Result 200, Processing Time 0.022 seconds

Classification and Analysis of Data Mining Algorithms (데이터마이닝 알고리즘의 분류 및 분석)

  • Lee, Jung-Won;Kim, Ho-Sook;Choi, Ji-Young;Kim, Hyon-Hee;Yong, Hwan-Seung;Lee, Sang-Ho;Park, Seung-Soo
    • Journal of KIISE:Databases
    • /
    • v.28 no.3
    • /
    • pp.279-300
    • /
    • 2001
  • Data mining plays an important role in knowledge discovery process and usually various existing algorithms are selected for the specific purpose of the mining. Currently, data mining techniques are actively to the statistics, business, electronic commerce, biology, and medical area and currently numerous algorithms are being researched and developed for these applications. However, in a long run, only a few algorithms, which are well-suited to specific applications with excellent performance in large database, will survive. So it is reasonable to focus our effort on those selected algorithms in the future. This paper classifies about 30 existing algorithms into 7 categories - association rule, clustering, neural network, decision tree, genetic algorithm, memory-based reasoning, and bayesian network. First of all, this work analyzes systematic hierarchy and characteristics of algorithms and we present 14 criteria for classifying the algorithms and the results based on this criteria. Finally, we propose the best algorithms among some comparable algorithms with different features and performances. The result of this paper can be used as a guideline for data mining researches as well as field applications of data mining.

  • PDF

Rule Generation and Approximate Inference Algorithms for Efficient Information Retrieval within a Fuzzy Knowledge Base (퍼지지식베이스에서의 효율적인 정보검색을 위한 규칙생성 및 근사추론 알고리듬 설계)

  • Kim Hyung-Soo
    • Journal of Digital Contents Society
    • /
    • v.2 no.2
    • /
    • pp.103-115
    • /
    • 2001
  • This paper proposes the two algorithms which generate a minimal decision rule and approximate inference operation, adapted the rough set and the factor space theory in fuzzy knowledge base. The generation of the minimal decision rule is executed by the data classification technique and reduct applying the correlation analysis and the Bayesian theorem related attribute factors. To retrieve the specific object, this paper proposes the approximate inference method defining the membership function and the combination operation of t-norm in the minimal knowledge base composed of decision rule. We compare the suggested algorithms with the other retrieval theories such as possibility theory, factor space theory, Max-Min, Max-product and Max-average composition operations through the simulation generating the object numbers and the attribute values randomly as the memory size grows. With the result of the comparison, we prove that the suggested algorithm technique is faster than the previous ones to retrieve the object in access time.

  • PDF

A CORBA-Based Collaborative Work Supported Medical Image Analysis and Visualization System (코바기반 협업지원 의료영상 분석 및 가시화 시스템)

  • Chun, Jun-Chul;Son, Jae-Gi
    • The KIPS Transactions:PartD
    • /
    • v.10D no.1
    • /
    • pp.109-116
    • /
    • 2003
  • In this paper, a CORBA-based collaborative medical image analysis and visualization system, which provides high accessibility and usability of the system for the users on distributed environment is introduced. The system allows us to manage datasets and manipulates medical images such as segmentation and volume visualization of computed geometry from biomedical images in distributed environments. Using Bayesian classification technique and an active contour model the system provides classification results of medical images or boundary information of specific tissue. Based on such information, the system can create real time 3D volume model from medical imagery. Moreover, the developed system supports collaborative work among multiple users using broadcasting and synchronization mechanisms. Since the system is developed using Java and CORBA, which provide distributed programming, the remote clients can access server objects via method invocation, without knowing where the distributed objects reside or what operating system it executes on.

Optimal Facial Emotion Feature Analysis Method based on ASM-LK Optical Flow (ASM-LK Optical Flow 기반 최적 얼굴정서 특징분석 기법)

  • Ko, Kwang-Eun;Park, Seung-Min;Park, Jun-Heong;Sim, Kwee-Bo
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.21 no.4
    • /
    • pp.512-517
    • /
    • 2011
  • In this paper, we propose an Active Shape Model (ASM) and Lucas-Kanade (LK) optical flow-based feature extraction and analysis method for analyzing the emotional features from facial images. Considering the facial emotion feature regions are described by Facial Action Coding System, we construct the feature-related shape models based on the combination of landmarks and extract the LK optical flow vectors at each landmarks based on the centre pixels of motion vector window. The facial emotion features are modelled by the combination of the optical flow vectors and the emotional states of facial image can be estimated by the probabilistic estimation technique, such as Bayesian classifier. Also, we extract the optimal emotional features that are considered the high correlation between feature points and emotional states by using common spatial pattern (CSP) analysis in order to improvise the operational efficiency and accuracy of emotional feature extraction process.

Active Vision from Image-Text Multimodal System Learning (능동 시각을 이용한 이미지-텍스트 다중 모달 체계 학습)

  • Kim, Jin-Hwa;Zhang, Byoung-Tak
    • Journal of KIISE
    • /
    • v.43 no.7
    • /
    • pp.795-800
    • /
    • 2016
  • In image classification, recent CNNs compete with human performance. However, there are limitations in more general recognition. Herein we deal with indoor images that contain too much information to be directly processed and require information reduction before recognition. To reduce the amount of data processing, typically variational inference or variational Bayesian methods are suggested for object detection. However, these methods suffer from the difficulty of marginalizing over the given space. In this study, we propose an image-text integrated recognition system using active vision based on Spatial Transformer Networks. The system attempts to efficiently sample a partial region of a given image for a given language information. Our experimental results demonstrate a significant improvement over traditional approaches. We also discuss the results of qualitative analysis of sampled images, model characteristics, and its limitations.

Improving the Accuracy of Document Classification by Learning Heterogeneity (이질성 학습을 통한 문서 분류의 정확성 향상 기법)

  • Wong, William Xiu Shun;Hyun, Yoonjin;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.3
    • /
    • pp.21-44
    • /
    • 2018
  • In recent years, the rapid development of internet technology and the popularization of smart devices have resulted in massive amounts of text data. Those text data were produced and distributed through various media platforms such as World Wide Web, Internet news feeds, microblog, and social media. However, this enormous amount of easily obtained information is lack of organization. Therefore, this problem has raised the interest of many researchers in order to manage this huge amount of information. Further, this problem also required professionals that are capable of classifying relevant information and hence text classification is introduced. Text classification is a challenging task in modern data analysis, which it needs to assign a text document into one or more predefined categories or classes. In text classification field, there are different kinds of techniques available such as K-Nearest Neighbor, Naïve Bayes Algorithm, Support Vector Machine, Decision Tree, and Artificial Neural Network. However, while dealing with huge amount of text data, model performance and accuracy becomes a challenge. According to the type of words used in the corpus and type of features created for classification, the performance of a text classification model can be varied. Most of the attempts are been made based on proposing a new algorithm or modifying an existing algorithm. This kind of research can be said already reached their certain limitations for further improvements. In this study, aside from proposing a new algorithm or modifying the algorithm, we focus on searching a way to modify the use of data. It is widely known that classifier performance is influenced by the quality of training data upon which this classifier is built. The real world datasets in most of the time contain noise, or in other words noisy data, these can actually affect the decision made by the classifiers built from these data. In this study, we consider that the data from different domains, which is heterogeneous data might have the characteristics of noise which can be utilized in the classification process. In order to build the classifier, machine learning algorithm is performed based on the assumption that the characteristics of training data and target data are the same or very similar to each other. However, in the case of unstructured data such as text, the features are determined according to the vocabularies included in the document. If the viewpoints of the learning data and target data are different, the features may be appearing different between these two data. In this study, we attempt to improve the classification accuracy by strengthening the robustness of the document classifier through artificially injecting the noise into the process of constructing the document classifier. With data coming from various kind of sources, these data are likely formatted differently. These cause difficulties for traditional machine learning algorithms because they are not developed to recognize different type of data representation at one time and to put them together in same generalization. Therefore, in order to utilize heterogeneous data in the learning process of document classifier, we apply semi-supervised learning in our study. However, unlabeled data might have the possibility to degrade the performance of the document classifier. Therefore, we further proposed a method called Rule Selection-Based Ensemble Semi-Supervised Learning Algorithm (RSESLA) to select only the documents that contributing to the accuracy improvement of the classifier. RSESLA creates multiple views by manipulating the features using different types of classification models and different types of heterogeneous data. The most confident classification rules will be selected and applied for the final decision making. In this paper, three different types of real-world data sources were used, which are news, twitter and blogs.

A Study of Statistical Learning as a CRM s Classifier Functions (CRM의 기능 분류를 위한 통계적 학습에 관한 연구)

  • Jang, Geun;Lee, Jung-Bae;Lee, Byung-Soo
    • The KIPS Transactions:PartB
    • /
    • v.11B no.1
    • /
    • pp.71-76
    • /
    • 2004
  • The recent ERP and CRM is mostly focused on the conventional function performances. However, the recent business environment has brought the change in market due to the rapid progress of internet and e-commerce. It is mostly becoming e-business and spreading out as development of the relationship with other cooperating companies, the rapid progress of the relationship with customers, and intensification competitive power through the development of business progress in the organization. CRM(custom relationship management) is a kind of the marketing progress which forms, manages, and intensifies the relationship between the customers and companies to manage the acquired customers and increase the worth of customers for the company. It needs the system base which analyzes the information of customers since it functions on the basis of various information about customers and is linked to the business category such as producing, marketing, and decision making. Since ERP is extending its function to SCM, CRM, and SEM(strategic Enterprise Management), the 21 century s ERP develop as the strategy tool of e-business and, as the mediation for this, will subdivide the functions of CRM effectively by the analogic study of data. Also, to accomplish classification work of the file which in existing becomes accomplished with possibility work with an automatic movement with the user will be able to accomplish a more efficiently work the agent which in order leads the machine studying law, it is one thing with system feature.

Exploring Feature Selection Methods for Effective Emotion Mining (효과적 이모션마이닝을 위한 속성선택 방법에 관한 연구)

  • Eo, Kyun Sun;Lee, Kun Chang
    • Journal of Digital Convergence
    • /
    • v.17 no.3
    • /
    • pp.107-117
    • /
    • 2019
  • In the era of SNS, many people relies on it to express their emotions about various kinds of products and services. Therefore, for the companies eagerly seeking to investigate how their products and services are perceived in the market, emotion mining tasks using dataset from SNSs become important much more than ever. Basically, emotion mining is a branch of sentiment analysis which is based on BOW (bag-of-words) and TF-IDF. However, there are few studies on the emotion mining which adopt feature selection (FS) methods to look for optimal set of features ensuring better results. In this sense, this study aims to propose FS methods to conduct emotion mining tasks more effectively with better outcomes. This study uses Twitter and SemEval2007 dataset for the sake of emotion mining experiments. We applied three FS methods such as CFS (Correlation based FS), IG (Information Gain), and ReliefF. Emotion mining results were obtained from applying the selected features to nine classifiers. When applying DT (decision tree) to Tweet dataset, accuracy increases with CFS, IG, and ReliefF methods. When applying LR (logistic regression) to SemEval2007 dataset, accuracy increases with ReliefF method.

Study on the Maintenance Interval Decisions for Life expectancy in Railway Turnout clearance Detector (철도 분기기 밀착검지기 Life expectancy의 유지보수 주기 결정에 관한 연구)

  • Jang, ByeongMok;Lee, Jongwoo
    • Journal of the Korean Society for Railway
    • /
    • v.20 no.4
    • /
    • pp.491-499
    • /
    • 2017
  • Railway turnout systems are one of the most important systems in a railway and abnormal turnout systems can cause serious accidents. To detect an abnormal state of a turnout, turnout clearance detectors are widely used. These devices consider a failure of a turnout clearance detectors to be a failure of the turnout system, that could hinder train operations. Analysis of turnout clearance detector failures is very important to ensure normal train operation. We categorized failures of detectors into four groups to identify failure characteristics of the 140 detectors, which are composed of main line detectors (A), side tracks (B), detectors that are in operation more than 80 times a day (C) and detectors that are in operation fewer than 10 times per day. Failures of detectors have mainly been caused in the control part, in the cables and sensors; failures are classified into four groups (A, B, C and D). We have tried to find failure density distributions for each type of failures, inferring the parameter distributions a priori. Finally, using the Bayesian inference we proposed a maintenance time for control parts through the mean time of the detector, life and the life expectancy.

A Data-driven Classifier for Motion Detection of Soldiers on the Battlefield using Recurrent Architectures and Hyperparameter Optimization (순환 아키텍쳐 및 하이퍼파라미터 최적화를 이용한 데이터 기반 군사 동작 판별 알고리즘)

  • Joonho Kim;Geonju Chae;Jaemin Park;Kyeong-Won Park
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.1
    • /
    • pp.107-119
    • /
    • 2023
  • The technology that recognizes a soldier's motion and movement status has recently attracted large attention as a combination of wearable technology and artificial intelligence, which is expected to upend the paradigm of troop management. The accuracy of state determination should be maintained at a high-end level to make sure of the expected vital functions both in a training situation; an evaluation and solution provision for each individual's motion, and in a combat situation; overall enhancement in managing troops. However, when input data is given as a timer series or sequence, existing feedforward networks would show overt limitations in maximizing classification performance. Since human behavior data (3-axis accelerations and 3-axis angular velocities) handled for military motion recognition requires the process of analyzing its time-dependent characteristics, this study proposes a high-performance data-driven classifier which utilizes the long-short term memory to identify the order dependence of acquired data, learning to classify eight representative military operations (Sitting, Standing, Walking, Running, Ascending, Descending, Low Crawl, and High Crawl). Since the accuracy is highly dependent on a network's learning conditions and variables, manual adjustment may neither be cost-effective nor guarantee optimal results during learning. Therefore, in this study, we optimized hyperparameters using Bayesian optimization for maximized generalization performance. As a result, the final architecture could reduce the error rate by 62.56% compared to the existing network with a similar number of learnable parameters, with the final accuracy of 98.39% for various military operations.