Search | Korea Science

Binary classification on compositional data

Joo, Jae Yun;Lee, Seokho
- Communications for Statistical Applications and Methods
- /
- v.28 no.1
- /
- pp.89-97
- /
- 2021
Due to boundedness and sum constraint, compositional data are often transformed by logratio transformation and their transformed data are put into traditional binary classification or discriminant analysis. However, it may be problematic to directly apply traditional multivariate approaches to the transformed data because class distributions are not Gaussian and Bayes decision boundary are not polynomial on the transformed space. In this study, we propose to use flexible classification approaches to transformed data for compositional data classification. Empirical studies using synthetic and real examples demonstrate that flexible approaches outperform traditional multivariate classification or discriminant analysis.
https://doi.org/10.29220/CSAM.2021.28.1.089 인용 PDF KSCI

Empirical Bayesian Prediction Analysis on Accelerated Lifetime Data (가속수명자료를 이용한 경험적 베이즈 예측분석)

Cho, Geon-Ho
- Journal of the Korean Data and Information Science Society
- /
- v.8 no.1
- /
- pp.21-30
- /
- 1997
In accelerated life tests, the failure time of an item is observed under a high stress level, and based on the time the performances of items are investigated at the normal stress level. In this paper, when the mean of the prior of a failure rate is known in the exponential lifetime distribution with censored accelerated failure time data, we utilize the empirical Bayesian method by using the moment estimators in order to estimate the parameters of the prior distribution and obtain the empirical Bayesian predictive density and predictive intervals for a future observation under the normal stress level.
PDF

Prediction of 305 Days Milk Production from Early Records in Dairy Cattle Using an Empirical Bayes Method

Pereira, J.A.C.;Suzuki, M.;Hagiya, K.
- Asian-Australasian Journal of Animal Sciences
- /
- v.14 no.11
- /
- pp.1511-1515
- /
- 2001
A prediction of 305 d milk production from early records using an empirical Bayes method (EBM) was performed. The EBM was compared with the best predicted estimation (BPE), test interval method (TIM), and the linearized Wood's model (LWM). Daily milk yields were obtained from 606 first lactation Japanese Holstein cows in three herds. From each file of 305 daily records, 10 random test day records with an interval of approximately one month were taken. The accuracies of these methods were compared using the absolute difference (AD) and the standard deviation (SD) of the differences between the actual and the estimated 305 d milk production. The results showed that in the early stage of the lactation, EBM was superior in obtaining the prediction with high accuracy. When all the herds were analyzed jointly, the AD during the first 5 test day records were on average 373, 590, 917 and 1,042 kg for EBM, BPE, TIM, and LWM, respectively. Corresponding SD for EBM, BPE, TIM, and LWM were on average 488, 733, 747 and 1,605 kg. When the herds were analyzed separately, the EBM predictions retained high accuracy. When more information on the actual lactation was added to the prediction, TIM and LWM gradually achieved better accuracies. Finally, in the last period of the lactation, the accuracy of both of the methods exceeded EBM and BPM. The AD for the last 2 samples analyzing all the herds jointly were on average 141, 142, 164, and 214 kg for LWM, TIM, EBM, and BPE, respectively. In the current practices of collecting monthly records, early prediction of future milk production may be more accurate using EBM. Alternatively, if enough information of the actual lactation is accumulated, TIM may obtain better accuracy in the latter stage of lactation.
https://doi.org/10.5713/ajas.2001.1511 인용 PDF

Mapping the Geographic Variations of the Low Birth Weight cases in South Korea: Bayesian Approaches (우리나라 저체중아 출생의 공간적 변동성 지도화: 베이지언적 접근)

Roh, Young-hee;Park, Key-ho
- Journal of the Korean Geographical Society
- /
- v.51 no.3
- /
- pp.367-380
- /
- 2016
This study reviewed and compared methods for mapping aggregated low birth weight (LBW) and geographic variations in LBW in South Korea. Based on this review, we produced LBW maps in South Korea. Standardized mortality/morbidity ratios (SMRs) and crude mortality rates have been widely used for many years in epidemiological research. However, SMR-based maps are likely to be affected by sample size of unit area. Therefore, this study adopted a model-based approach using Bayesian estimates to reduce noisy variability in the SMR. By using a Bayesian model, we can calculate a statistically reliable RR values. We used the full Bayes estimator, as well as empirical Bayes estimators. As a result, variations in the two Bayes models were similar. The SMR-based statistics had the largest variation. The result maps can be used to identify regions with a high risk of LBW in South Korea.
PDF

An Empirical Comparison of Machine Learning Models for Classifying Emotions in Korean Twitter (한국어 트위터의 감정 분류를 위한 기계학습의 실증적 비교)

Lim, Joa-Sang;Kim, Jin-Man
- Journal of Korea Multimedia Society
- /
- v.17 no.2
- /
- pp.232-239
- /
- 2014
As online texts have been rapidly growing, their automatic classification gains more interest with machine learning methods. Nevertheless, comparatively few research could be found, aiming for Korean texts. Evaluating them with statistical methods are also rare. This study took a sample of tweets and used machine learning methods to classify emotions with features of morphemes and n-grams. As a result, about 76% of emotions contained in tweets was correctly classified. Of the two methods compared in this study, Support Vector Machines were found more accurate than Na$\ddot{i}$ve Bayes. The linear model of SVM was not inferior to the non-linear one. Morphological features did not contribute to accuracy more than did the n-grams.
https://doi.org/10.9717/kmms.2014.17.2.232 인용 PDF KSCI KPUBS

The Comparison Study on Observational Before-After Studies: Case Study on Safety Evaluation on Highways (관찰적 사전·사후 평가연구 방법의 비교 연구: 공용중인 고속도로 안전진단사업 효과평가를 사례로)

Mun, Sung Ra;Lee, Young-Ihn
- Journal of Korean Society of Transportation
- /
- v.31 no.6
- /
- pp.67-89
- /
- 2013
This study is to perform empirical analysis on observational before-after studies in Naive Method, Comparison Group(CG) Method and Empirical Bayes(EB) Method, and to compare with their results and to propose ways to apply to evaluation researches. For this purpose, the evaluation of road safety audit executed on Y$\breve{o}$ng-dong freeway in 2005 and 2006 was performed. As a result, all three methods have showed improved effects due to safety treatments. The safety effectiveness of Naive method is the largest, CG Method is the second and EB method is the last. The results of Naive method are overestimated due to the trend of reducing traffic accidents and those of CG method are affected by the external casual effects of comparison group. In the EB method, as "regression to the mean" phenomenon are controlled by reference group's accident model, it's result is relatively more accurate than that of other methods. In the conduct of evaluation studies, the analysts have to understand the pros and cons of each evaluation method. And after leading the survey on accident trends of related all sites, evaluation analysis is performed to be able to minimize bias.
https://doi.org/10.7470/jkst.2013.31.6.067 인용 PDF KSCI

A Bayesian test for the first-order autocorrelations in regression analysis (회귀모형 오차항의 1차 자기상관에 대한 베이즈 검정법)

김혜중;한성실
- The Korean Journal of Applied Statistics
- /
- v.11 no.1
- /
- pp.97-111
- /
- 1998
This paper suggests a Bayesian method for testing first-order markov correlation among linear regression disturbances. As a Bayesian test criterion, Bayes factor is derived in the form of generalized Savage-Dickey density ratio that is easily estimated by means of posterior simulation via Gibbs sampling scheme. Performance of the Bayesian test is evaluated and examined based upon a Monte Carlo experiment and an empirical data analysis. Efficiency of the posterior simulation is also examined.
PDF

EMPIRICAL BAYES THRESHOLDING: ADAPTING TO SPARSITY WHEN IT ADVANTAGEOUS TO DO SO

Silverman Bernard W.
- Journal of the Korean Statistical Society
- /
- v.36 no.1
- /
- pp.1-29
- /
- 2007
Suppose one is trying to estimate a high dimensional vector of parameters from a series of one observation per parameter. Often, it is possible to take advantage of sparsity in the parameters by thresholding the data in an appropriate way. A marginal maximum likelihood approach, within a suitable Bayesian structure, has excellent properties. For very sparse signals, the procedure chooses a large threshold and takes advantage of the sparsity, while for signals where there are many non-zero values, the method does not perform excessive smoothing. The scope of the method is reviewed and demonstrated, and various theoretical, practical and computational issues are discussed, in particularly exploring the wide potential and applicability of the general approach, and the way it can be used within more complex thresholding problems such as curve estimation using wavelets.
PDF KSCI

Analysis of high school students' views on science-technology-society (HS-VOSTS) questionnaire results (고등학생을 위한 과학-기술-사회에 대한 시각 (HS-VOST) 설문조사 결과 분석)

Kang, Dae-Ki
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2011.10a
- /
- pp.201-203
- /
- 2011
We report an experimental result of applying a data mining algorithm for analyzing the questionnaire results of high school students' views on science-technology-society (HS-VOSTS). The preliminary empirical result of Naive Bayes classifier on HS-VOSTS questionnaire from one South Korean university students indicates that data mining algorithms can be effectively applied to automated knowledge discovery from students' survey data.
PDF

Effective Korean sentiment classification method using word2vec and ensemble classifier (Word2vec과 앙상블 분류기를 사용한 효율적 한국어 감성 분류 방안)

Park, Sung Soo;Lee, Kun Chang
- Journal of Digital Contents Society
- /
- v.19 no.1
- /
- pp.133-140
- /
- 2018
Accurate sentiment classification is an important research topic in sentiment analysis. This study suggests an efficient classification method of Korean sentiment using word2vec and ensemble methods which have been recently studied variously. For the 200,000 Korean movie review texts, we generate a POS-based BOW feature and a feature using word2vec, and integrated features of two feature representation. We used a single classifier of Logistic Regression, Decision Tree, Naive Bayes, and Support Vector Machine and an ensemble classifier of Adaptive Boost, Bagging, Gradient Boosting, and Random Forest for sentiment classification. As a result of this study, the integrated feature representation composed of BOW feature including adjective and adverb and word2vec feature showed the highest sentiment classification accuracy. Empirical results show that SVM, a single classifier, has the highest performance but ensemble classifiers show similar or slightly lower performance than the single classifier.
https://doi.org/10.9728/dcs.2018.19.1.133 인용 PDF KSCI

Search Result 106, Processing Time 0.026 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)