통합 검색 | Korea Science

Study of Machine-Learning Classifier and Feature Set Selection for Intent Classification of Korean Tweets about Food Safety

Yeom, Ha-Neul;Hwang, Myunggwon;Hwang, Mi-Nyeong;Jung, Hanmin
- Journal of Information Science Theory and Practice
- /
- 제2권3호
- /
- pp.29-39
- /
- 2014
In recent years, several studies have proposed making use of the Twitter micro-blogging service to track various trends in online media and discussion. In this study, we specifically examine the use of Twitter to track discussions of food safety in the Korean language. Given the irregularity of keyword use in most tweets, we focus on optimistic machine-learning and feature set selection to classify collected tweets. We build the classifier model using Naive Bayes & Naive Bayes Multinomial, Support Vector Machine, and Decision Tree Algorithms, all of which show good performance. To select an optimum feature set, we construct a basic feature set as a standard for performance comparison, so that further test feature sets can be evaluated. Experiments show that precision and F-measure performance are best when using a Naive Bayes Multinomial classifier model with a test feature set defined by extracting Substantive, Predicate, Modifier, and Interjection parts of speech.
https://doi.org/10.1633/JISTaP.2014.2.3.3 인용 PDF KSCI HTML

EMPIRICAL BAYES TESTING FOR MEAN LIFE TIME OF RAYLEIGH DISTRIBUTION

Liang, TaChen
- Journal of applied mathematics & informatics
- /
- 제25권1_2호
- /
- pp.1-15
- /
- 2007
Consider a Rayleigh distribution with $$pdf\;p(x/{\theta})\;=\;2x{\theta}^{-1}\;{\exp}\;({-x^2}/{\theta})$$ and mean lifetime ${\mu}\;=\;\sqrt{\pi\theta}/2$. We study the two-action problem of testing the hypotheses $H_{0}\;:\;{\mu}{\leq}{\mu}_{0}$ against $H_{1}\;:\;{\mu}\;>\;{\mu}_{0}$ using a linear error loss of ${\mid}{\mu}\;-\;{\mu}_{0}{\mid}$ via the empirical Bayes approach. We construct a monotone empirical Bayes test ${\delta}^{*}_{n}$ and study its associated asymptotic optimality. It is shown that the regret of ${\delta}^{*}_{n}$ converges to zero at a rate $\frac{{\ln}^{2}n}{n}$, where n is the number of past data available when the present testing problem is considered.

MDA에서 판별변수 선택을 위한 베이즈 기준 (A Bayes Criterion for Selecting Variables in MDA)

김혜중;유희경
- 응용통계연구
- /
- 제11권2호
- /
- pp.435-449
- /
- 1998
본 연구는 다중판별분석(MDA)에서 필요한 변수선택기준을 베이즈접근법으로 제안하였다. 이 베이즈판별변수 선택기준은 여러 정규모집단분포의 평균벡터에 대한 동질성 검정에 필요한 디폴터형태의 베이즈요인을 객관적 베이즈방법으로 유도하여 설정하였다. 디폴트베이즈요인(default Bayes factor)은 Spiegelhalter와 Smith (1982)가 계발한 가상적트레이닝표본법(imaginary training sample method)을 사용하여서 도출하였다. 또한 제안된 베이즈판별변수선택 기준이 지닌 분포의 성질을 이용하여, 추가 판별변수(또는 변수군)가 MDA에 기여하는 부가적인 판별력에 대한 검정법 및 추가판별변수(또는 변수군)의 선택 기준에 대해서도 논하였다. 본 연구에서 새로이 얻은 변수선택기준은 최적부분집합선택법(optimal subset selection method)뿐 아니라 각 단계적방법(stepwise method)의 변수선택기준으로 사용될 수 있으며, 두 그룹 판별분석에도 사용이 가능하다는 점에서 표본이론에 의해 여러 형태로 개발된 기존의 판별변수 선택 기준들을 하나로 통합시킬 수 있는 기능을 지니고 있다. 모의실험을 실시하여 최적 부분집합선택법과 단계적방법하에서 제안된 판별변수선택 기준이 가진 효용성을 평가하였다.
PDF

회귀모형 오차항의 1차 자기상관에 대한 베이즈 검정법 : SPC 분야에의 응용 (A Bayesian Test for First Order Autocorrelation in Regression Errors : An Application to SPC Approach)

김혜중;한성실
- 품질경영학회지
- /
- 제24권4호
- /
- pp.190-206
- /
- 1996
In case measurements are made on units of production in time order, it is reasonable to expect that the measurement errors will sometimes be first order autocorrelated, and a technique to test such autocorrelation is required to give good control of the productive process. Tool-wear process provide an example for which regression model can sometimes be useful in modeling and controlling the process. For the control of such process, we present a simple method for testing first order autocorrelation in regression errors. The method is based on Bayesian test method via Bayes factor and derived by observing that in general, a Bayes factor can be written as the product of a quantity called the Savage-Dickey density ratio and a correction factor ; both terms are easily estimated from Gibbs sampling technique. Performance of the method is examined by means of Monte Carlo simulation. It is noted that the test not only achieves satisfactory power but eliminates the inconvenience occurred in using the well-known Durbin-Watson test.
PDF

BAYESIAN TEST FOR THE EQUALITY OF THE MEANS AND VARIANCES OF THE TWO NORMAL POPULATIONS WITH VARIANCES RELATED TO THE MEANS USING NONINFORMATIVE PRIORS

Kim, Dal-Ho;Kang, Sang-Gil;Lee, Woo-Dong
- Journal of the Korean Statistical Society
- /
- 제32권3호
- /
- pp.271-288
- /
- 2003
In this paper, when the variance of the normal distribution is related to the mean, we develop noninformative priors such as matching priors and reference priors. We prove that the second order matching prior matches alternative coverage probabilities up to the same order and also it is a HPD matching prior. It turns out that one-at-a-time reference prior satisfies a second order matching criterion. Then using these noninformative priors, we develop a Bayesian test procedure for the equality of the means and variances of two independent normal distributions using fractional Bayes factor. Some simulation study is performed, and a real data example is also provided.
PDF KSCI

Bayes 정리에 기반한 개선된 동형이의어 분별 모텔 (An Improved Homonym Disambiguation Model based on Bayes Theory)

김창환;이왕우
- 한국컴퓨터산업학회논문지
- /
- 제2권12호
- /
- pp.1581-1590
- /
- 2001
본 연구에서는 동형이의어 분별을 위하여 허정(2000)이 제시한 "사전 뜻풀이말에서 추출한 의미정보에 기반한 동형이의어 중의성 해결 시스템"이 가지는 문제점과 향후 연구과제로 제시한 문제들을 개선하기 위하여 Bayes 정리에 기반한 동형이의어 분별 모델을 제안한다. 의미 분별된 사전 뜻풀이말 코퍼스에서 동형이의어를 포함하고 있는 뜻풀이말을 구성하는 체언류(보통 명사), 용언류(형용사, 동사) 및 부사류(부사)를 의미 정보로 추출한다. 동형이의어의 의미별 사전 출현 빈도수가 비교적 균등한 기존 9개의 동형이의어 명사를 대상으로 실험하여 비교하였고, 새로 7개의 동형이의어 용언(형용사, 동사)을 추가하여 실험하였다. 9개의 동형이의어 명사를 대상으로 한 내부 실험에서 평균 99.37% 정확률을 보였으며 7개의 동형이의어 용언을 대상으로 한 내부 실험에서 평균 99.53% 정확률을 보였다. 외부 실험은 국어 정보베이스와 ETRI 코퍼스를 이용하여 9개의 동형이의어 명사를 대상으로 평균 84.42% 정확률과 세종계획의 350만 어절 규모의 외부 코퍼스를 이용하여 7개의 동형이의 어 용언을 대상으로 평균 70.81%의 정확률을 보였다. 정확률을 보였다.
PDF

실제 네트워크 모니터링 환경에서의 ML 알고리즘을 이용한 트래픽 분류 (Traffic Classification Using Machine Learning Algorithms in Practical Network Monitoring Environments)

정광본;최미정;김명섭;원영준;홍원기
- 한국통신학회논문지
- /
- 제33권8B호
- /
- pp.707-718
- /
- 2008
Traffic classification의 방법은 동적으로 변하는 application의 변화에 대처하기 위하여 페이로드나 port를 기반으로 하는 것에서 ML 알고리즘을 기반으로 하는 것으로 변하여 가고 있다. 그러나 현재의 ML 알고리즘을 이용한 traffic classification 연구는 offline 환경에 맞추어 진행되고 있다. 특히, 현재의 기존 연구들은 testing 방법으로 cross validation을 이용하여 traffic classification을 수행하고 있으며, traffic flow를 기반으로 classification 결과를 제시하고 있다. 본 논문에서는 testing방법으로 cross validation과 split validation을 이용했을 때, traffic classification의 정확도 결과를 비교한다. 또한 바이트를 기반으로 한 classification의 결과와 flow를 기반으로 한 classification의 결과를 비교해 본다. 본 논문에서는 J48, REPTree, RBFNetwork, Multilayer perceptron, BayesNet, NaiveBayes와 같은 ML 알고리즘과 다양한 feature set을 이용하여 트래픽을 분류한다. 그리고 split validation을 이용한 traffic classification에 적합한 최적의 ML 알고리즘과 feature set을 제시한다.
PDF KSCI

Classical and Bayesian methods of estimation for power Lindley distribution with application to waiting time data

Sharma, Vikas Kumar;Singh, Sanjay Kumar;Singh, Umesh
- Communications for Statistical Applications and Methods
- /
- 제24권3호
- /
- pp.193-209
- /
- 2017
The power Lindley distribution with some of its properties is considered in this article. Maximum likelihood, least squares, maximum product spacings, and Bayes estimators are proposed to estimate all the unknown parameters of the power Lindley distribution. Lindley's approximation and Markov chain Monte Carlo techniques are utilized for Bayesian calculations since posterior distribution cannot be reduced to standard distribution. The performances of the proposed estimators are compared based on simulated samples. The waiting times of research articles to be accepted in statistical journals are fitted to the power Lindley distribution with other competing distributions. Chi-square statistic, Kolmogorov-Smirnov statistic, Akaike information criterion and Bayesian information criterion are used to access goodness-of-fit. It was found that the power Lindley distribution gives a better fit for the data than other distributions.
https://doi.org/10.5351/CSAM.2017.24.3.193 인용 PDF KSCI

의사결정트리의 분류 정확도 향상 (Classification Accuracy Improvement for Decision Tree)

메하리 마르타 레제네;박상현
- 한국정보처리학회:학술대회논문집
- /
- 한국정보처리학회 2017년도 춘계학술발표대회
- /
- pp.787-790
- /
- 2017
Data quality is the main issue in the classification problems; generally, the presence of noisy instances in the training dataset will not lead to robust classification performance. Such instances may cause the generated decision tree to suffer from over-fitting and its accuracy may decrease. Decision trees are useful, efficient, and commonly used for solving various real world classification problems in data mining. In this paper, we introduce a preprocessing technique to improve the classification accuracy rates of the C4.5 decision tree algorithm. In the proposed preprocessing method, we applied the naive Bayes classifier to remove the noisy instances from the training dataset. We applied our proposed method to a real e-commerce sales dataset to test the performance of the proposed algorithm against the existing C4.5 decision tree classifier. As the experimental results, the proposed method improved the classification accuracy by 8.5% and 14.32% using training dataset and 10-fold crossvalidation, respectively.
https://doi.org/10.3745/PKIPS.y2017m04a.787 인용 PDF

베이지안 학습을 이용한 문서의 자동분류 (An Automatic Document Classification with Bayesian Learning)

김진상;신양규
- Journal of the Korean Data and Information Science Society
- /
- 제11권1호
- /
- pp.19-30
- /
- 2000
정보통신기술의 비약적인 발전은 온라인으로 생성되는 전자문서의 양을 폭발적으로 증가시키고 있다. 따라서 수동으로 문서를 분류하던 종래의 방법 대신 문서의 자동분유 기술 개발이 특별히 요구되고 있다. 본 논문에서는 베이지안 학습 기법을 이용하여 문서를 자동으로 분류하는 방법을 연구하고, 20개의 유즈넷 뉴스그룹 문서들을 분류하도록 시험하였다. 사용한 알고리즘은 Naive Bayes Classifier이며, 구현한 시스템을 이용해 유즈넷 문서를 대상으로 자동분류를 실험한 결과 분류의 정확률이 약 77%로 나타났다.
PDF

검색결과 110건 처리시간 0.022초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)