• Title/Summary/Keyword: 베이지안 분류기

Search Result 78, Processing Time 0.027 seconds

Impact of Diverse Document-evaluation Measure-based Searching Methods in Big Data Search Accuracy (빅데이터 검색 정확도에 미치는 다양한 측정 방법 기반 검색 기법의 효과)

  • Kim, Ji young;Han, DaHyeon;Kim, Jongkwon
    • Journal of KIISE
    • /
    • v.44 no.5
    • /
    • pp.553-558
    • /
    • 2017
  • With the rapid growth of Big Data, research on extracting meaningful information is being pursued by both academia and industry. Especially, data characteristics derived from analysis, and researcher intention are key factors for search algorithms to obtain accurate output. Therefore, reflecting both data characteristics and researcher intention properly is the final goal of data analysis research. The data analyzed properly can help users to increase loyalty to the service provided by company, and to utilize information more effectively and efficiently. In this paper, we explore various methods of document-evaluation, so that we can improve the accuracy of searching article one of the most frequently searches used in real life. We also analyze the experiment result, and suggest the proper manners to use various methods.

Committee Learning Classifier based on Attribute Value Frequency (속성 값 빈도 기반의 전문가 다수결 분류기)

  • Lee, Chang-Hwan;Jung, In-Chul;Kwon, Young-S.
    • Journal of KIISE:Databases
    • /
    • v.37 no.4
    • /
    • pp.177-184
    • /
    • 2010
  • In these day, many data including sensor, delivery, credit and stock data are generated continuously in massive quantity. It is difficult to learn from these data because they are large in volume and changing fast in their concepts. To handle these problems, learning methods based in sliding window methods over time have been used. But these approaches have a problem of rebuilding models every time new data arrive, which requires a lot of time and cost. Therefore we need very simple incremental learning methods. Bayesian method is an example of these methods but it has a disadvantage which it requries the prior knowledge(probabiltiy) of data. In this study, we propose a learning method based on attribute values. In the proposed method, even though we don't know the prior knowledge(probability) of data, we can apply our new method to data. The main concept of this method is that each attribute value is regarded as an expert learner, summing up the expert learners lead to better results. Experimental results show our learning method learns from data very fast and performs well when compared to current learning methods(decision tree and bayesian).

Bayesian Texture Segmentation Using Multi-layer Perceptron and Markov Random Field Model (다층 퍼셉트론과 마코프 랜덤 필드 모델을 이용한 베이지안 결 분할)

  • Kim, Tae-Hyung;Eom, Il-Kyu;Kim, Yoo-Shin
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.44 no.1
    • /
    • pp.40-48
    • /
    • 2007
  • This paper presents a novel texture segmentation method using multilayer perceptron (MLP) networks and Markov random fields in multiscale Bayesian framework. Multiscale wavelet coefficients are used as input for the neural networks. The output of the neural network is modeled as a posterior probability. Texture classification at each scale is performed by the posterior probabilities from MLP networks and MAP (maximum a posterior) classification. Then, in order to obtain the more improved segmentation result at the finest scale, our proposed method fuses the multiscale MAP classifications sequentially from coarse to fine scales. This process is done by computing the MAP classification given the classification at one scale and a priori knowledge regarding contextual information which is extracted from the adjacent coarser scale classification. In this fusion process, the MRF (Markov random field) prior distribution and Gibbs sampler are used, where the MRF model serves as the smoothness constraint and the Gibbs sampler acts as the MAP classifier. The proposed segmentation method shows better performance than texture segmentation using the HMT (Hidden Markov trees) model and HMTseg.

A Hyperlink-based Feature Weighting Technique for Web Document Classification (웹문서 자동 분류를 위한 하이퍼링크 기반 특징 가중치 부여 기법)

  • Lee, A-Ram;Kim, Han-Joon
    • Annual Conference of KIPS
    • /
    • 2012.11a
    • /
    • pp.417-420
    • /
    • 2012
  • 기계학습을 이용하는 문서 자동분류 시스템은 분류모델의 구성을 위해서 단어를 특징으로 사용한다. 자동분류 시스템의 성능을 높이기 위해 보다 의미있는 특징을 선택하여 분류모델을 구성하기 위한 여러 연구가 진행되고 있다. 특히 인터넷상에서 사용되는 웹문서는 단어 외에도 태그정보, 링크정보를 가지고 있다. 본 논문에서는 이 두 가지 정보를 이용하여 웹문서 자동분류 시스템의 성능을 향상 시키는 방법 제안 한다. 태그 정보와 링크 정보를 이용하여 적절한 특징을 선택하고, 각 특징의 중요도를 계산하여 가중치를 구한다. 계산된 가중치를 각 특징에 부여하여 분류 모델을 구성하고 나이브 베이지안 분류기를 통하여 성능을 평가하였다

Bayesian Optimization Framework for Improved Cross-Version Defect Prediction (향상된 교차 버전 결함 예측을 위한 베이지안 최적화 프레임워크)

  • Choi, Jeongwhan;Ryu, Duksan
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.10 no.9
    • /
    • pp.339-348
    • /
    • 2021
  • In recent software defect prediction research, defect prediction between cross projects and cross-version projects are actively studied. Cross-version defect prediction studies assume WP(Within-Project) so far. However, in the CV(Cross-Version) environment, the previous work does not consider the distribution difference between project versions is important. In this study, we propose an automated Bayesian optimization framework that considers distribution differences between different versions. Through this, it automatically selects whether to perform transfer learning according to the difference in distribution. This framework is a technique that optimizes the distribution difference between versions, transfer learning, and hyper-parameters of the classifier. We confirmed that the method of automatically selecting whether to perform transfer learning based on the distribution difference is effective through experiments. Moreover, we can see that using our optimization framework is effective in improving performance and, as a result, can reduce software inspection effort. This is expected to support practical quality assurance activities for new version projects in a cross-version project environment.

A K-Nearest Neighbor Algorithm for Categorical Sequence Data (범주형 시퀀스 데이터의 K-Nearest Neighbor알고리즘)

  • Oh Seung-Joon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.10 no.2 s.34
    • /
    • pp.215-221
    • /
    • 2005
  • TRecently, there has been enormous growth in the amount of commercial and scientific data, such as protein sequences, retail transactions, and web-logs. Such datasets consist of sequence data that have an inherent sequential nature. In this Paper, we study how to classify these sequence datasets. There are several kinds techniques for data classification such as decision tree induction, Bayesian classification and K-NN etc. In our approach, we use a K-NN algorithm for classifying sequences. In addition, we propose a new similarity measure to compute the similarity between two sequences and an efficient method for measuring similarity.

  • PDF

Identification and classification of fresh lubricants and used engine oils by GC/MS and bayesian model (GC/MS 분석과 베이지안 분류 모형을 이용한 새 윤활유와 사용 엔진 오일의 동일성 추적과 분류)

  • Kim, Nam Yee;Nam, Geum Mun;Kim, Yuna;Lee, Dong-Kye;Park, Seh Youn;Lee, Kyoungjae;Lee, Jaeyong
    • Analytical Science and Technology
    • /
    • v.27 no.1
    • /
    • pp.41-59
    • /
    • 2014
  • The aims of this work were the identification and the classification of fresh lubricants and used engine oils of vehicles for the application in forensic science field-80 kinds of fresh lubricants were purchased and 86 kinds of used engine oils were sampled from 24 kinds of diesel and gasoline vehicles with different driving conditions. The sample of lubricants and used engine oils were analyzed by GC/MS. The Bayesian model technique was developed for classification or identification. Both the wavelet fitting and the principal component analysis (PCA) techniques as a data dimension reduction were applied. In fresh lubricants classification, the rates of matching by Bayesian model technique with wavelet fitting and PCA were 97.5% and 96.7%, respectively. The Bayesian model technique with wavelet fitting was better to classify lubricants than it with PCA based on dimension reduction. And we selected the Bayesian model technique with wavelet fitting for classification of lubricants. The other experiment was the analysis of used engine oils which were collected from vehicles with the several mileage up to 5,000 km after replacing engine oil. The eighty six kinds of used engine oil sample with the mileage were collected. In vehicle classification (total 24 classes), the rate of matching by Bayesian model with wavelet fitting was 86.4%. However, in the vehicle's fuel type classification (whether it is gasoline vehicle or diesel vehicle, only total 2 classes), the rate of matching was 99.6%. In the used engine oil brands classification (total 6 classes), the rate of matching was 97.3%.

Personalized Activity Recognizer and Logger in Smart Phone Environment (스마트폰 환경에서 개인화된 행위 인식기 및 로거)

  • Cho, Geumhwan;Han, Manhyung;Lee, Ho Sung;Lee, Sungyoung
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2012.07a
    • /
    • pp.65-68
    • /
    • 2012
  • 본 논문에서는 최근 활발히 연구가 진행되고 있는 행위인식 연구 분야 중에서 스마트폰 환경에서의 개인화된 행위 인식기 및 로거를 제안한다. 최근 스마트폰의 보급이 활발해지면서 행위 인식 연구 분야에서 스마트폰을 이용하는 연구가 활발히 진행되고 있다. 그러나 스마트폰에서는 센서를 이용하여 행위정보를 수집하고, 서버에서 는 분류 및 처리하는 방식으로 실시간 인식과 개발자에 의한 트레이닝으로 인해 개인화된 트레이닝이 불가능하다는 단점이 있다. 이러한 단점을 극복하고자 Naive Bayes Classifier를 사용하여 스마트폰 환경에서 실시간으로 사용자 행위 수집이 가능하고 행위정보의 분류 및 처리가 가능한 경량화 및 개인화된 행위 인식기 및 로거의 구현을 목적으로 한다. 제안하는 방법은 행위 인식기를 통해 행위 인식이 가능할 뿐만 아니라 로거를 통해 사용자의 라이프로그, 라이프패턴 등의 연구 분야에 이용이 가능하다.

  • PDF

Identification of User Behaviors Consuming Internet Services by Traffic Observation (트래픽 관찰을 통한 인터넷 서비스 소비성향의 식별)

  • Lee, Taek;In, Hoh Peter
    • Annual Conference of KIPS
    • /
    • 2009.11a
    • /
    • pp.449-450
    • /
    • 2009
  • 사용자의 인터넷 소비성향을 파악하고 그에 적응적인 인프라 리소스를 제공하는 일은 네트워크 설계/관리자나 인터넷 서비스 공급자(ISP)들에게는 주요 관심사이다. 이러한 분석은 한정된 네트워크 자원을 보다 적절한 지점에 효율적인 방식으로 투자하도록 도와준다. 본 논문은 각종 인터넷 서비스를 활용하는 사용자들의 서비스(각종 인터넷 어플리케이션) 소비성향을 네트워크 트래픽 관찰만으로 파악할 수 있는 성향분류 척도를 제안한다. 아울러 베이지안 분류기를 사용하여 제안 척도를 활용한 사용자 성향 분류 방법을 함께 제시한다.

실시간 영상에서의 휴먼 검출 및 얼굴 분류

  • Kim, Geon-Woo;Nam, Mi-Young;Han, Jong-Wook
    • Review of KIISC
    • /
    • v.20 no.3
    • /
    • pp.48-57
    • /
    • 2010
  • 본 고는 휴먼 객체 검출 및 분류를 위한 것으로서, 입력된 동영상에서 배경 이미지와의 차분 영상을 통해 객체 영역을 검출하고, 검출된 객체 영역에서 얼굴 즉 헤드 영역을 검출하는 방법에 대해서 설명한다. 실시간으로 녹화된 동영상에서 사람이 움직이는 위치와, 크기 등이 아주 다양하며, 또한 한 사람이 아닌 여러 사람 객체를 검출하기 위하여 다중의 사람객체 검출기를 이용한 캐스케이드 사람 객체 추출 방법을 제안한다. 얼굴 크기 등을 고려하여 헤드 영역의 shape 를 기반으로 하여 1차 검출을 수행하고, 검출되지 않은 영역에 대하여 히스토그램 기반의 얼굴 영역을 검출한다. 또한 중복된 영상에 대해 베이지안 얼굴 검출기를 통해 인증함으로써 성능을 향상시킬 수 있다.