• Title/Summary/Keyword: 베이지안 분류

Search Result 200, Processing Time 0.029 seconds

A Comparative Study on the Accuracy of Important Statistical Prediction Techniques for Marketing Data (마케팅 데이터를 대상으로 중요 통계 예측 기법의 정확성에 대한 비교 연구)

  • Cho, Min-Ho
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.14 no.4
    • /
    • pp.775-780
    • /
    • 2019
  • Techniques for predicting the future can be categorized into statistics-based and deep-run-based techniques. Among them, statistic-based techniques are widely used because simple and highly accurate. However, working-level officials have difficulty using many analytical techniques correctly. In this study, we compared the accuracy of prediction by applying multinomial logistic regression, decision tree, random forest, support vector machine, and Bayesian inference to marketing related data. The same marketing data was used, and analysis was conducted by using R. The prediction results of various techniques reflecting the data characteristics of the marketing field will be a good reference for practitioners.

Data processing techniques applying data mining based on enterprise cloud computing (데이터 마이닝을 적용한 기업형 클라우드 컴퓨팅 기반 데이터 처리 기법)

  • Kang, In-Seong;Kim, Tae-Ho;Lee, Hong-Chul
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.8
    • /
    • pp.1-10
    • /
    • 2011
  • Recently, cloud computing which has provided enabling convenience that users can connect from anywhere and user friendly environment that offers on-demand network access to a shared pool of configurable computing resources such as smart-phones, net-books and PDA etc, is to be watched as a service that leads the digital revolution. Now, when business practices between departments being integrated through a cooperating system such as cloud computing, data streaming between departments is getting enormous and then it is inevitably necessary to find the solution that person in charge and find data they need. In previous studies the clustering simplifies the search process, but in this paper, it applies Hash Function to remove the de-duplicates in large amount of data in business firms. Also, it applies Bayesian Network of data mining for classifying the respect data and presents handling cloud computing based data. This system features improved search performance as well as the results Compared with conventional methods and CPU, Network Bandwidth Usage in such an efficient system performance is achieved.

Semantic Topic Selection Method of Document for Classification (문서분류를 위한 의미적 주제선정방법)

  • Ko, kwang-Sup;Kim, Pan-Koo;Lee, Chang-Hoon;Hwang, Myung-Gwon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.11 no.1
    • /
    • pp.163-172
    • /
    • 2007
  • The web as global network includes text document, video, sound, etc and connects each distributed information using link Through development of web, it accumulates abundant information and the main is text based documents. Most of user use the web to retrieve information what they want. So, numerous researches have progressed to retrieve the text documents using the many methods, such as probability, statistics, vector similarity, Bayesian, and so on. These researches however, could not consider both the subject and the semantics of documents. As a result user have to find by their hand again. Especially, it is more hard to find the korean document because the researches of korean document classification is insufficient. So, to overcome the previous problems, we propose the korean document classification method for semantic retrieval. This method firstly, extracts TF value and RV value of concepts that is included in document, and maps into U-WIN that is korean vocabulary dictionary to select the topic of document. This method is possible to classify the document semantically and showed the efficiency through experiment.

Game Recommendation System Based on User Ratings (사용자 평점 기반 게임 추천 시스템)

  • Kim, JongHyen;Jo, HyeonJeong;Kim, Byeong Man
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.23 no.6
    • /
    • pp.9-19
    • /
    • 2018
  • As the recent developments in the game industry and people's interest in game streaming become more popular, non-professional gamers are also interested in games and buying them. However, it is difficult to judge which game is the most enjoyable among the games released in dozens every day. Although the game sales platform is equipped with the game recommendation function, it is not accurate because it is used as a means of increasing their sales and recommending users with a focus on their discount products or new products. For this reason, in this paper, we propose a game recommendation system based on the users ratings, which raises the recommendation satisfaction level of users and appropriately reflect their experience. In the system, we implement the rate prediction function using collaborative filtering and the game recommendation function using Naive Bayesian classifier to provide users with quick and accurate recommendations. As the result, the rate prediction algorithm achieved a throughput of 2.4 seconds and an average of 72.1 percent accuracy. For the game recommendation algorithm, we obtained 75.187 percent accuracy and were able to provide users with fast and accurate recommendations.

Collaborative Tag-based Filtering for Recommender Systems (효과적인 추천 시스템을 위한 협업적 태그 기반의 여과 기법)

  • Yeon, Cheol;Ji, Ae-Ttie;Kim, Heung-Nam;Jo, Geun-Sik
    • Journal of Intelligence and Information Systems
    • /
    • v.14 no.2
    • /
    • pp.157-177
    • /
    • 2008
  • Even in a single day, an enormous amount of content including digital videos, posts, photographs, and wikis are generated on the web. It's getting more difficult to recommend to a user what he/she prefers among these contents because of the difficulty of automatically grasping of content's meanings. CF (Collaborative Filtering) is one of useful methods to recommend proper content to a user under these situations because the filtering process is only based on historical information about whether or not a target user has preferred an item before. Collaborative Tagging is the process that allows many users to annotate content with descriptive tags. Recommendation using tags can partially improve, such as the limitations of CF, the sparsity and cold-start problem. In this research, a CF method with user-created tags is proposed. Collaborative tagging is employed to grasp and filter users' preferences for items. Empirical demonstrations using real dataset from del.icio.us show that our algorithm obtains improved performance, compared with existing works.

  • PDF

Adaptation of Classification Model for Improving Speech Intelligibility in Noise (음성 명료도 향상을 위한 분류 모델의 잡음 환경 적응)

  • Jung, Junyoung;Kim, Gibak
    • Journal of Broadcast Engineering
    • /
    • v.23 no.4
    • /
    • pp.511-518
    • /
    • 2018
  • This paper deals with improving speech intelligibility by applying binary mask to time-frequency units of speech in noise. The binary mask is set to "0" or "1" according to whether speech is dominant or noise is dominant by comparing signal-to-noise ratio with pre-defined threshold. Bayesian classifier trained with Gaussian mixture model is used to estimate the binary mask of each time-frequency signal. The binary mask based noise suppressor improves speech intelligibility only in noise condition which is included in the training data. In this paper, speaker adaptation techniques for speech recognition are applied to adapt the Gaussian mixture model to a new noise environment. Experiments with noise-corrupted speech are conducted to demonstrate the improvement of speech intelligibility by employing adaption techniques in a new noise environment.

A study on the Filtering of Spam E-mail using n-Gram indexing and Support Vector Machine (n-Gram 색인화와 Support Vector Machine을 사용한 스팸메일 필터링에 대한 연구)

  • 서정우;손태식;서정택;문종섭
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.14 no.2
    • /
    • pp.23-33
    • /
    • 2004
  • Because of a rapid growth of internet environment, it is also fast increasing to exchange message using e-mail. But, despite the convenience of e-mail, it is rising a currently bi9 issue to waste their time and cost due to the spam mail in an individual or enterprise. Many kinds of solutions have been studied to solve harmful effects of spam mail. Such typical methods are as follows; pattern matching using the keyword with representative method and method using the probability like Naive Bayesian. In this paper, we propose a classification method of spam mails from normal mails using Support Vector Machine, which has excellent performance in pattern classification problems, to compensate for the problems of existing research. Especially, the proposed method practices efficiently a teaming procedure with a word dictionary including a generated index by the n-Gram. In the conclusion, we verified the proposed method through the accuracy comparison of spm mail separation between an existing research and proposed scheme.

An Approach to Detect Spam E-mail with Abnormal Character Composition (비정상 문자 조합으로 구성된 스팸 메일의 탐지 방법)

  • Lee, Ho-Sub;Cho, Jae-Ik;Jung, Man-Hyun;Moon, Jong-Sub
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.18 no.6A
    • /
    • pp.129-137
    • /
    • 2008
  • As the use of the internet increases, the distribution of spam mail has also vastly increased. The email's main use was for the exchange of information, however, currently it is being more frequently used for advertisement and malware distribution. This is a serious problem because it consumes a large amount of the limited internet resources. Furthermore, an extensive amount of computer, network and human resources are consumed to prevent it. As a result much research is being done to prevent and filter spam. Currently, research is being done on readable sentences which do not use proper grammar. This type of spam can not be classified by previous vocabulary analysis or document classification methods. This paper proposes a method to filter spam by using the subject of the mail and N-GRAM for indexing and Bayesian, SVM algorithms for classification.

An Effective Feature Generation Method for Distributed Denial of Service Attack Detection using Entropy (엔트로피를 이용한 분산 서비스 거부 공격 탐지에 효과적인 특징 생성 방법 연구)

  • Kim, Tae-Hun;Seo, Ki-Taek;Lee, Young-Hoon;Lim, Jong-In;Moon, Jong-Sub
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.20 no.4
    • /
    • pp.63-73
    • /
    • 2010
  • Malicious bot programs, the source of distributed denial of service attack, are widespread and the number of PCs which were infected by malicious bot program are increasing geometrically thesedays. The continuous distributed denial of service attacks are happened constantly through these bot PCs and some financial incident cases have found lately. Therefore researches to response distributed denial of service attack are necessary so we propose an effective feature generation method for distributed denial of service attack detection using entropy. In this paper, we apply our method to both the DARPA 2000 datasets and also the distributed denial of service attack datasets that we composed and generated ourself in general university. And then we evaluate how the proposed method is useful through classification using bayesian network classifier.

Prototype based Classification by Generating Multidimensional Spheres per Class Area (클래스 영역의 다차원 구 생성에 의한 프로토타입 기반 분류)

  • Shim, Seyong;Hwang, Doosung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.20 no.2
    • /
    • pp.21-28
    • /
    • 2015
  • In this paper, we propose a prototype-based classification learning by using the nearest-neighbor rule. The nearest-neighbor is applied to segment the class area of all the training data into spheres within which the data exist from the same class. Prototypes are the center of spheres and their radii are computed by the mid-point of the two distances to the farthest same class point and the nearest another class point. And we transform the prototype selection problem into a set covering problem in order to determine the smallest set of prototypes that include all the training data. The proposed prototype selection method is based on a greedy algorithm that is applicable to the training data per class. The complexity of the proposed method is not complicated and the possibility of its parallel implementation is high. The prototype-based classification learning takes up the set of prototypes and predicts the class of test data by the nearest neighbor rule. In experiments, the generalization performance of our prototype classifier is superior to those of the nearest neighbor, Bayes classifier, and another prototype classifier.