• Title/Summary/Keyword: Bayesian Classifier

Search Result 150, Processing Time 0.029 seconds

Junk-Mail Filtering by Mail Address Validation and Title-Content Weighting (메일 주소 유효성과 제목-내용 가중치 기법에 의한 스팸 메일 필터링)

  • Kang Seung-Shik
    • Journal of Korea Multimedia Society
    • /
    • v.9 no.2
    • /
    • pp.255-263
    • /
    • 2006
  • It is common that a junk mail has an inconsistency of mail addresses between those of the mail headers and the mail recipients. In addition, users easily know that an email is a junk or legitimate mail only by looking for the title of the email. In this paper, we tried to apply the filtering classifiers of mail address validation check and the combination method of title-content weighting to improve the performance of junk mail filtering system. In order to verify the effectiveness of the proposed method, we performed an experiment by applying them to Naive Bayesian classifier. The experiment includes the unit testing and the combination of the filtering techniques. As a result, we found that our method improved 11.6% of recall and 2.1% of precision that it contributed the enhancement of the junk mail filtering system.

  • PDF

A Genre-based Classification of Digital Documents by using Deviation Statistic of Genre-revealing Term and Subject-revealing Term (장르와 주제 범주간 용어 편차정보를 이용한 디지털 문서의 장르기반 분류)

  • 이용배;맹성현
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.11
    • /
    • pp.1062-1071
    • /
    • 2003
  • A genre-based classification means classifying documents by the purpose for which they were written, not by the semantics or subject areas. Most genre classifying methods in the past were based on the existing documents categorization algorithms and ineffective for feature selections, resulting in low quality classification results. In this research, we propose a new method for automatic classification of digital documents by genre. The genre classifier we developed uses the deviation statistic between the genre-revealing term frequencies and between the subject-revealing term frequencies within a genre. We collected Web documents to evaluate the proposed genre classification method. The experimental results show that the proposed method outperforms a direct application of a kai-square feature selection and bayesian classifier often used for subject classification by proving an excellent accuracy of about 30 percent.

A study on the Pattern Recognition of the EMG signals using Neural Network and Probabilistic modal for the two dimensional Motions described by External Coordinate (신경회로망과 확률모델을 이용한 2차원운동의 외부좌표에 대한 EMG신호의 패턴인식에 관한 연구)

  • Jang, Young-Gun;Kwon, Jang-Woo;Hong, Seung-Hong
    • Proceedings of the KOSOMBE Conference
    • /
    • v.1991 no.05
    • /
    • pp.65-70
    • /
    • 1991
  • A hybrid model which uses a probabilistic model and a MLP(multi layer perceptron) model for pattern recognition of EMG(electromyogram) signals is proposed in this paper. MLP model has problems which do not guarantee global minima of error due to learning method and have different approximation grade to bayesian probabilities due to different amounts and quality of training data, the number of hidden layers and hidden nodes, etc. Especially in the case of new test data which exclude design samples, the latter problem produces quite different results. The error probability of probabilistic model is closely related to the estimation error of the parameters used in the model and fidelity of assumtion. Generally, it is impossible to introduce the bayesian classifier to the probabilistic model of EMG signals because of unknown priori probabilities and is estimated by MLE(maximum likelihood estimate). In this paper we propose the method which get the MAP(maximum a posteriori probability) in the probabilistic model by estimating the priori probability distribution which minimize the error probability using the MLP. This method minimize the error probability of the probabilistic model as long as the realization of the MLP is optimal and approximate the minimum of error probability of each class of both models selectively. Alocating the reference coordinate of EMG signal to the outside of the body make it easy to suit to the applications which it is difficult to define and seperate using internal body coordinate. Simulation results show the benefit of the proposed model compared to use the MLP and the probabilistic model seperately.

  • PDF

Generation and Selection of Nominal Virtual Examples for Improving the Classifier Performance (분류기 성능 향상을 위한 범주 속성 가상예제의 생성과 선별)

  • Lee, Yu-Jung;Kang, Byoung-Ho;Kang, Jae-Ho;Ryu, Kwang-Ryel
    • Journal of KIISE:Software and Applications
    • /
    • v.33 no.12
    • /
    • pp.1052-1061
    • /
    • 2006
  • This paper presents a method of using virtual examples to improve the classification accuracy for data with nominal attributes. Most of the previous researches on virtual examples focused on data with numeric attributes, and they used domain-specific knowledge to generate useful virtual examples for a particularly targeted learning algorithm. Instead of using domain-specific knowledge, our method samples virtual examples from a naive Bayesian network constructed from the given training set. A sampled example is considered useful if it contributes to the increment of the network's conditional likelihood when added to the training set. A set of useful virtual examples can be collected by repeating this process of sampling followed by evaluation. Experiments have shown that the virtual examples collected this way.can help various learning algorithms to derive classifiers of improved accuracy.

Real-Time Place Recognition for Augmented Mobile Information Systems (이동형 정보 증강 시스템을 위한 실시간 장소 인식)

  • Oh, Su-Jin;Nam, Yang-Hee
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.14 no.5
    • /
    • pp.477-481
    • /
    • 2008
  • Place recognition is necessary for a mobile user to be provided with place-dependent information. This paper proposes real-time video based place recognition system that identifies users' current place while moving in the building. As for the feature extraction of a scene, there have been existing methods based on global feature analysis that has drawback of sensitive-ness for the case of partial occlusion and noises. There have also been local feature based methods that usually attempted object recognition which seemed hard to be applied in real-time system because of high computational cost. On the other hand, researches using statistical methods such as HMM(hidden Markov models) or bayesian networks have been used to derive place recognition result from the feature data. The former is, however, not practical because it requires huge amounts of efforts to gather the training data while the latter usually depends on object recognition only. This paper proposes a combined approach of global and local feature analysis for feature extraction to complement both approaches' drawbacks. The proposed method is applied to a mobile information system and shows real-time performance with competitive recognition result.

A Study on a Smart Digital Signage Using Bayesian Age Estimation Technique for the Next Generation Airport Service (차세대 공항 서비스를 위한 베이지안 연령추정기법을 이용하는 스마트 디지털 사이니지에 대한 연구)

  • Kim, Chun-Ho;Lee, Dong Woo;Baek, Gyeong Min;Moon, Seong Yeop;Heo, Chan;Na, Jong Whoa;Ohn, Seung-Yup;Choi, Woo Young
    • Journal of Advanced Navigation Technology
    • /
    • v.18 no.6
    • /
    • pp.533-540
    • /
    • 2014
  • We propose an age estimation-based smart digital signage for the next-generation airport service. The proposed system can recognize the face of the customer so that it can display the selective information. Using a webcam, the system captures the face of the customer and estimates the age of the customer by calculating the wrinkle density of the face and applying bayesian classifier. The developed age estimation method is tested with a face database for the performance evaluation. We expect the new digital signage may improve the satisfaction of customers of the airport business.

Comparison of nomograms designed to predict hypertension with a complex sample (고혈압 예측을 위한 노모그램 구축 및 비교)

  • Kim, Min Ho;Shin, Min Seok;Lee, Jea Young
    • The Korean Journal of Applied Statistics
    • /
    • v.33 no.5
    • /
    • pp.555-567
    • /
    • 2020
  • Hypertension has a steadily increasing incidence rate as well as represents a risk factors for secondary diseases such as cardiovascular disease. Therefore, it is important to predict the incidence rate of the disease. In this study, we constructed nomograms that can predict the incidence rate of hypertension. We use data from the Korean National Health and Nutrition Examination Survey (KNHANES) for 2013-2016. The complex sampling data required the use of a Rao-Scott chi-squared test to identify 10 risk factors for hypertension. Smoking and exercise variables were not statistically significant in the Logistic regression; therefore, eight effects were selected as risk factors for hypertension. Logistic and Bayesian nomograms constructed from the selected risk factors were proposed and compared. The constructed nomograms were then verified using a receiver operating characteristics curve and calibration plot.

Near Realtime Packet Classification & Handling Mechanism for Visualized Security Management in Cloud Environments (클라우드 환경에서 보안 가시성 확보를 위한 자동화된 패킷 분류 및 처리기법)

  • Ahn, Myong-ho;Ryoo, Mi-hyeon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2014.10a
    • /
    • pp.331-337
    • /
    • 2014
  • Paradigm shift to cloud computing has increased the importance of security. Even though public cloud computing providers such as Amazon, already provides security related service like firewall and identity management services, it is not suitable to protect data in cloud environments. Because in public cloud computing environments do not allow to use client's own security solution nor equipments. In this environments, user are supposed to do something to enhance security by their hands, so the needs of visualized security management arises. To implement visualized security management, developing near realtime data handling & packet classification mechanisms are crucial. The key technical challenges in packet classification is how to classify packet in the manner of unsupervised way without human interactions. To achieve the goal, this paper presents automated packet classification mechanism based on naive-bayesian and packet Chunking techniques, which can identify signature and does machine learning by itself without human intervention.

  • PDF

Classifying Indian Medicinal Leaf Species Using LCFN-BRNN Model

  • Kiruba, Raji I;Thyagharajan, K.K;Vignesh, T;Kalaiarasi, G
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.10
    • /
    • pp.3708-3728
    • /
    • 2021
  • Indian herbal plants are used in agriculture and in the food, cosmetics, and pharmaceutical industries. Laboratory-based tests are routinely used to identify and classify similar herb species by analyzing their internal cell structures. In this paper, we have applied computer vision techniques to do the same. The original leaf image was preprocessed using the Chan-Vese active contour segmentation algorithm to efface the background from the image by setting the contraction bias as (v) -1 and smoothing factor (µ) as 0.5, and bringing the initial contour close to the image boundary. Thereafter the segmented grayscale image was fed to a leaky capacitance fired neuron model (LCFN), which differentiates between similar herbs by combining different groups of pixels in the leaf image. The LFCN's decay constant (f), decay constant (g) and threshold (h) parameters were empirically assigned as 0.7, 0.6 and h=18 to generate the 1D feature vector. The LCFN time sequence identified the internal leaf structure at different iterations. Our proposed framework was tested against newly collected herbal species of natural images, geometrically variant images in terms of size, orientation and position. The 1D sequence and shape features of aloe, betel, Indian borage, bittergourd, grape, insulin herb, guava, mango, nilavembu, nithiyakalyani, sweet basil and pomegranate were fed into the 5-fold Bayesian regularization neural network (BRNN), K-nearest neighbors (KNN), support vector machine (SVM), and ensemble classifier to obtain the highest classification accuracy of 91.19%.

An Efficient Method for Detecting Denial of Service Attacks Using Kernel Based Data (커널 기반 데이터를 이용한 효율적인 서비스 거부 공격 탐지 방법에 관한 연구)

  • Chung, Man-Hyun;Cho, Jae-Ik;Chae, Soo-Young;Moon, Jong-Sub
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.19 no.1
    • /
    • pp.71-79
    • /
    • 2009
  • Currently much research is being done on host based intrusion detection using system calls which is a portion of kernel based data. Sequence based and frequency based preprocessing methods are mostly used in research for intrusion detection using system calls. Due to the large amount of data and system call types, it requires a significant amount of preprocessing time. Therefore, it is difficult to implement real-time intrusion detection systems. Despite this disadvantage, the frequency based method which requires a relatively small amount of preprocessing time is usually used. This paper proposes an effective method for detecting denial of service attacks using the frequency based method. Principal Component Analysis(PCA) will be used to select the principle system calls and a bayesian network will be composed and the bayesian classifier will be used for the classification.