• Title/Summary/Keyword: 베이지안 분류

Search Result 200, Processing Time 0.029 seconds

A K-Nearest Neighbor Algorithm for Categorical Sequence Data (범주형 시퀀스 데이터의 K-Nearest Neighbor알고리즘)

  • Oh Seung-Joon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.10 no.2 s.34
    • /
    • pp.215-221
    • /
    • 2005
  • TRecently, there has been enormous growth in the amount of commercial and scientific data, such as protein sequences, retail transactions, and web-logs. Such datasets consist of sequence data that have an inherent sequential nature. In this Paper, we study how to classify these sequence datasets. There are several kinds techniques for data classification such as decision tree induction, Bayesian classification and K-NN etc. In our approach, we use a K-NN algorithm for classifying sequences. In addition, we propose a new similarity measure to compute the similarity between two sequences and an efficient method for measuring similarity.

  • PDF

Impact of Diverse Document-evaluation Measure-based Searching Methods in Big Data Search Accuracy (빅데이터 검색 정확도에 미치는 다양한 측정 방법 기반 검색 기법의 효과)

  • Kim, Ji young;Han, DaHyeon;Kim, Jongkwon
    • Journal of KIISE
    • /
    • v.44 no.5
    • /
    • pp.553-558
    • /
    • 2017
  • With the rapid growth of Big Data, research on extracting meaningful information is being pursued by both academia and industry. Especially, data characteristics derived from analysis, and researcher intention are key factors for search algorithms to obtain accurate output. Therefore, reflecting both data characteristics and researcher intention properly is the final goal of data analysis research. The data analyzed properly can help users to increase loyalty to the service provided by company, and to utilize information more effectively and efficiently. In this paper, we explore various methods of document-evaluation, so that we can improve the accuracy of searching article one of the most frequently searches used in real life. We also analyze the experiment result, and suggest the proper manners to use various methods.

Extraction of Hazardous Freeway Sections Using GPS-Based Probe Vehicle Speed Data (GPS 프로브 차량 속도자료를 이용한 고속도로 사고 위험구간 추출기법)

  • Park, Jae-Hong;Oh, Cheol;Kim, Tae-Hyung;Joo, Shin-Hye
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.9 no.3
    • /
    • pp.73-84
    • /
    • 2010
  • This study presents a novel method to identify hazardous segments of freeway using global positioning system(GPS) based probe vehicle data. A variety of candidate contributing factors leading to higher potential of accident occurrence were extracted from the probe vehicle dataset. The research problem was defined as a classification problem, then a well-known classifier, bayesian neural network was adopted to solve the problem. A binary logistic regression technique was also used for selecting salient input variables. Test results showed that the proposed method is promising in extracting hazardous freeway sections. The outcome of this study will be effectively used for evaluating the safety of freeway sections and deriving countermeasures to prevent accidents.

A Bayesian Validation Method for Classification of Microarray Expression Data (마이크로어레이 발현 데이터 분류를 위한 베이지안 검증 기법)

  • Park, Su-Young;Jung, Jong-Pil;Jung, Chai-Yeoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.10 no.11
    • /
    • pp.2039-2044
    • /
    • 2006
  • Since the bio-information now even exceeds the capability of human brain, the techniques of data mining and artificial intelligent are needed to deal with the information in this field. There are many researches about using DNA microarray technique which can obtain information from thousands of genes at once, for developing new methods of analyzing and predicting of diseases. Discovering the mechanisms of unknown genes by using these new method is expecting to develop the new drugs and new curing methods. In this Paper, We tested accuracy on classification of microarray in Bayesian method to compare normalization method's Performance after dividing data in two class that is a feature abstraction method through a normalization process which reduce or remove noise generating in microarray experiment by various factors. And We represented that it improve classification performance in 95.89% after Lowess normalization.

Preference Prediction System using Similarity Weight granted Bayesian estimated value and Associative User Clustering (베이지안 추정치가 부여된 유사도 가중치와 연관 사용자 군집을 이용한 선호도 예측 시스템)

  • 정경용;최성용;임기욱;이정현
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.3_4
    • /
    • pp.316-325
    • /
    • 2003
  • A user preference prediction method using an exiting collaborative filtering technique has used the nearest-neighborhood method based on the user preference about items and has sought the user's similarity from the Pearson correlation coefficient. Therefore, it does not reflect any contents about items and also solve the problem of the sparsity. This study suggests the preference prediction system using the similarity weight granted Bayesian estimated value and the associative user clustering to complement problems of an exiting collaborative preference prediction method. This method suggested in this paper groups the user according to the Genre by using Association Rule Hypergraph Partitioning Algorithm and the new user is classified into one of these Genres by Naive Bayes classifier to slove the problem of sparsity in the collaborative filtering system. Besides, for get the similarity between users belonged to the classified genre and new users, this study allows the different estimated value to item which user vote through Naive Bayes learning. If the preference with estimated value is applied to the exiting Pearson correlation coefficient, it is able to promote the precision of the prediction by reducing the error of the prediction because of missing value. To estimate the performance of suggested method, the suggested method is compared with existing collaborative filtering techniques. As a result, the proposed method is efficient for improving the accuracy of prediction through solving problems of existing collaborative filtering techniques.

Evaluation of Future Hydrologic Risk of Drought in Nakdong River Basin Using Bayesian Classification-Based Composite Drought Index (베이지안 분류 기반 통합가뭄지수를 활용한 낙동강 유역의 미래 가뭄에 대한 수문학적 위험도 분석)

  • Kim, Hyeok;Kim, Ji Eun;Kim, Jiyoung;Yoo, Jiyoung;Kim, Tae-Woong
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.43 no.3
    • /
    • pp.309-319
    • /
    • 2023
  • Recently, the frequency and intensity of meteorological disasters have increased due to climate change. In South Korea, there are regional differences in vulnerability and response capability to cope with climate change because of regional climate characteristics. In particular, drought results from various factors and is linked to extensive meteorological, hydrological, and agricultural impacts. Therefore, in order to effectively cope with drought, it is necessary to use a composite drought index that can take into account various factors, and to evaluate future droughts comprehensively considering climate change. This study evaluated hydrologic risk(${\bar{R}}$) of future drought in the Nakdong River basin based on the Dynamic Naive Bayesian Classification (DNBC)-based composite drought index, which was calculated by applying Standardized Precipitation Index (SPI), Streamflow Drought Index (SDI), Evaporate Stress Index (ESI) and Water Supply Capacity Index (WSCI) to the DNBC. The indices used in the DNBC were calculated using observation data and climate scenario data. A bivariate frequency analysis was performed for the severity and duration of the composite drought. Then using the estimated bivariate return periods, hydrologic risks of drought were calculated for observation and future periods. The overall results indicated that there were the highest risks during the future period (2021-2040) (${\bar{R}}$=0.572), and Miryang River (#2021) had the highest risk (${\bar{R}}$=0.940) on average. The hydrologic risk of the Nakdong River basin will increase highly in the near future (2021-2040). During the far future (2041-2099), the hydrologic risk decreased in the northern basins, and increased in the southern basins.

Social Commerce Food Coupon Recommending System Based On Context Information Using Bayesian Network (베이지안 네트워크를 이용한 상황정보에 기반을 둔 소셜커머스 음식 쿠폰 추천시스템)

  • Jeong, Hyeon-Ju;Lee, Sang-Yong
    • Journal of Digital Convergence
    • /
    • v.11 no.3
    • /
    • pp.389-395
    • /
    • 2013
  • More sales of food and beverage coupons have been made using SNS on social commerce recently. If one buys coupons on social commerce, he/she can enjoy products at a lower price; however, there are drawbacks that one must consider such as location, service hours, and discount rate. Thus, this paper suggests a system that recommends food and beverage coupons on social commerce for users that considers a user's personal context of location, time, and purchase history. In order to reflect a user's context awareness and continuous preference, this paper suggests a method based on the Bayesian network. In order to reflect personalized weighting on the standard of coupon selection to match a user's preference, a measurement and classification of weighting preferences is performed on the basis of AHP. 20 experiments in one month involving 12 students were carried out to verify the effectiveness of the system, resulting in an 80% satisfaction level.

Bayesian Optimization Framework for Improved Cross-Version Defect Prediction (향상된 교차 버전 결함 예측을 위한 베이지안 최적화 프레임워크)

  • Choi, Jeongwhan;Ryu, Duksan
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.10 no.9
    • /
    • pp.339-348
    • /
    • 2021
  • In recent software defect prediction research, defect prediction between cross projects and cross-version projects are actively studied. Cross-version defect prediction studies assume WP(Within-Project) so far. However, in the CV(Cross-Version) environment, the previous work does not consider the distribution difference between project versions is important. In this study, we propose an automated Bayesian optimization framework that considers distribution differences between different versions. Through this, it automatically selects whether to perform transfer learning according to the difference in distribution. This framework is a technique that optimizes the distribution difference between versions, transfer learning, and hyper-parameters of the classifier. We confirmed that the method of automatically selecting whether to perform transfer learning based on the distribution difference is effective through experiments. Moreover, we can see that using our optimization framework is effective in improving performance and, as a result, can reduce software inspection effort. This is expected to support practical quality assurance activities for new version projects in a cross-version project environment.

Feature Selection for Document Classification (문서 분류를 위한 특징 선택)

  • Jin, Hoon;Kim, In-Cheol
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2001.04b
    • /
    • pp.262-264
    • /
    • 2001
  • 본 논문은 덱스트 형태로 존재하는 문서가 특정 범주가 특정 범주에 속하는 지를 판별하는데 있어서 그 문서를 표현하고 있는 특징을 어떻게 선택할 것인가와 얼마나 선택할 것인가가 미치는 영향을 실험을 통하여 측정하였다. 우리는 실험을 통하여 특징 선택 방법이 분류 성능에 미치는 영향을 알아보고자 하였고, 특징의 개수와 분류 성능과의 상관관계, 그리고 범주의 개수와 특징의 개수와의 관계를 규명하고자 하였다. 결과를 통하여 우리는 뉴스 그룹 문서의 경우 그 분포상황의 특이성에 기인하여 정보획득 방법이 가장 좋은 성능을 냄을 알 수 있었고, 문서의 특징의 개수에 따라 성능에 있어서 커다란 차이가 있음도 알게 되었다. 또한 정보획득 방법과 나이브 베이지안 분류방법을 이용했을 때 가장 좋은 성능을 도출하는 특징의 개수가 범주의 개수에 비례함을 알 수 있었다.

  • PDF

Comments Classification System using Topic Signature and n-gram (Topic signatur e와 n-gram을 이용한 댓글 분류 시스템)

  • Bae, Min-Young;Cha, Jeong-Won
    • Annual Conference on Human and Language Technology
    • /
    • 2008.10a
    • /
    • pp.189-194
    • /
    • 2008
  • 본 논문에서는 토픽 시그너처(Topic Signature)와 n-gram을 이용한 댓글 분류 시스템을 개발한다. 토픽 시그너처는 문서요약이나 문서분류에서 자질 선택을 위한 방법으로 많이 사용되어지며, n-gram은 모든 언어에 적용 가능한 장점이 있다. 악성댓글은 대체로 문장 길이가 짧고 유행어나 변형어의 출현 빈도가 높으며 비정형화된 특징이 있다. 따라서 우리는 댓글을 n-gram으로 나누어 자질로 선택한다. 분류를 위해 베이지안(Bayesian)모델을 사용하였다. 본 논문에서는 한글과 영어 댓글에 대한 판별 실험을 통하여 구현한 시스템이 복잡한 전처리 과정이 필요한 기존에 제안된 방법들보다 더 나은 성능을 보이며, 언어에 관계없이 적용 가능하다는 것을 실험 결과를 통해 확인할 수 있었다.

  • PDF