• Title/Summary/Keyword: sampling and classification

Search Result 350, Processing Time 0.033 seconds

Accelerating the EM Algorithm through Selective Sampling for Naive Bayes Text Classifier (나이브베이즈 문서분류시스템을 위한 선택적샘플링 기반 EM 가속 알고리즘)

  • Chang Jae-Young;Kim Han-Joon
    • The KIPS Transactions:PartD
    • /
    • v.13D no.3 s.106
    • /
    • pp.369-376
    • /
    • 2006
  • This paper presents a new method of significantly improving conventional Bayesian statistical text classifier by incorporating accelerated EM(Expectation Maximization) algorithm. EM algorithm experiences a slow convergence and performance degrade in its iterative process, especially when real online-textual documents do not follow EM's assumptions. In this study, we propose a new accelerated EM algorithm with uncertainty-based selective sampling, which is simple yet has a fast convergence speed and allow to estimate a more accurate classification model on Naive Bayesian text classifier. Experiments using the popular Reuters-21578 document collection showed that the proposed algorithm effectively improves classification accuracy.

Work Measurement through Application of Work Sampling in Hospital Dietary Departments Classified by the Productivity Level (급식생산성 유형별 병원 영양과의 워크샘플링 (Work Sampling)을 적용한 작업분석)

  • 양일선
    • Journal of Nutrition and Health
    • /
    • v.26 no.4
    • /
    • pp.443-454
    • /
    • 1993
  • The purposes of this study were to analyze the work patterns of selected hospital foodservices by Work Sampling methodology, and to investigate the relationship among operational factors affecting productivity. The hospitals were classified into 3 groups by the percentage of patient meals, and the percentage of special patient diet, and the menu items of patients meals. The groups clustered were characterized by productivity. Work Sampling methodology was utilized to analyze the work patterns of hospitals with selected 3 hospitals to investigate the productivity the productivity and labor times used in each work functions. Productivity index analyzed by Work Sampling were 10.36 min/meal, 10.95 min/meal, and 12.19 min/meal, respectively X, Y, Z hospital. Z hospital was significantly different from time used in direct work function and delay. Direct work function time was the highest, delay the lowest in Z hospital. The relation between the results of Work Sampling and the productivities of 3 groups showed not by delay but direct work function in the classification used in this study.

  • PDF

Molecular Phylogeny of the Subfamily Tephritinae (Diptera: Tephritidae) Based on Mitochondrial 16S rDNA Sequences

  • Han, Ho-Yeon;Ro, Kyung-Eui;McPheron, Bruce A.
    • Molecules and Cells
    • /
    • v.22 no.1
    • /
    • pp.78-88
    • /
    • 2006
  • The phylogeny of the subfamily Tephritinae (Diptera: Tephritidae) was reconstructed from mitochondrial 16S ribosomal RNA gene sequences using 53 species representing 11 currently recognized tribes of the Tephritinae and 10 outgroup species. The minimum evolution and Bayesian trees suggested the following phylogenetic relationships: (1) monophyly of the Tephritinae was strongly supported; (2) a sister group relationship between the Tephritinae and Plioreocepta was supported by the Bayesian tree; (3) the tribes Tephrellini, Myopitini, and Terelliini (excluding Neaspilota) were supported as monophyletic groups; (4) the non-monophyletic nature of the tribes Dithrycini, Eutretini, Noeetini, Tephritini, Cecidocharini, and Xyphosiini; and (5) recognition of 10 putative tribal groups, most of which were supported strongly by the statistical tests of the interior branches. Our results, therefore, convincingly suggest that an extensive rearrangement of the tribal classification of the Tephritinae is necessary. Since our sampling of taxa heavily relied on the current accepted classification, some lineages identified by the present study were severely under-sampled and other possible major lineages of the Tephritinae were probably not even represented in our dataset. We believe that our results provide baseline information for a more rigorous sampling of additional taxa representing all possible major lineages of the subfamily, which is essential for a comprehensive revision of the tephritine tribal classification.

A study on the improvement ransomware detection performance using combine sampling methods (혼합샘플링 기법을 사용한 랜섬웨어탐지 성능향상에 관한 연구)

  • Kim Soo Chul;Lee Hyung Dong;Byun Kyung Keun;Shin Yong Tae
    • Convergence Security Journal
    • /
    • v.23 no.1
    • /
    • pp.69-77
    • /
    • 2023
  • Recently, ransomware damage has been increasing rapidly around the world, including Irish health authorities and U.S. oil pipelines, and is causing damage to all sectors of society. In particular, research using machine learning as well as existing detection methods is increasing for ransomware detection and response. However, traditional machine learning has a problem in that it is difficult to extract accurate predictions because the model tends to predict in the direction where there is a lot of data. Accordingly, in an imbalance class consisting of a large number of non-Ransomware (normal code or malware) and a small number of Ransomware, a technique for resolving the imbalance and improving ransomware detection performance is proposed. In this experiment, we use two scenarios (Binary, Multi Classification) to confirm that the sampling technique improves the detection performance of a small number of classes while maintaining the detection performance of a large number of classes. In particular, the proposed mixed sampling technique (SMOTE+ENN) resulted in a performance(G-mean, F1-score) improvement of more than 10%.

A Geostatistical Study Using Qualitative Information for Tunnel Rock Binary Classificationll- II. Applcation (이분적 터널 암반 분류를 위한 정성적 자료의 지구통계학적 연구 II. 응용)

  • 유광호
    • Geotechnical Engineering
    • /
    • v.10 no.1
    • /
    • pp.19-26
    • /
    • 1994
  • In this paper, the application of the rock classification method based on indicator kriging and the cost of errors, which can incorporate qualitative data, was presented. In particular, the binary classification of rock masses was considered. To this end, a simplified RMR system was used. Since most of subjectivity in this analysis occur during the estimation of loss functions, a sensitivity analysis of loss functions was performed. Through this research, it was found out that an expected cost of errors could successfully be used as an indication for how well a sampling plan was designed. In certain conditions, qualitative data can be more economical than quantitative data in terms of expected costs of errors and sampling costs. Therefore, an additional sampling should be carefully determined depending upon the surrounding geologic conditions and its sampling cost. The application method shown in this paper can be useful for more systematic rock classifications.

  • PDF

Image-Based Skin Cancer Classification System Using Attention Layer (Attention layer를 활용한 이미지 기반 피부암 분류 시스템)

  • GyuWon Lee;SungHee Woo
    • Journal of Practical Engineering Education
    • /
    • v.16 no.1_spc
    • /
    • pp.59-64
    • /
    • 2024
  • As the aging population grows, the incidence of cancer is increasing. Skin cancer appears externally, but people often don't notice it or simply overlook it. As a result, if the early detection period is missed, the survival rate in the case of late stage cancer is only 7.5-11%. However, the disadvantage of diagnosing, serious skin cancer is that it requires a lot of time and money, such as a detailed examination and cell tests, rather than simple visual diagnosis. To overcome these challenges, we propose an Attention-based CNN model skin cancer classification system. If skin cancer can be detected early, it can be treated quickly, and the proposed system can greatly help the work of a specialist. To mitigate the problem of image data imbalance according to skin cancer type, this skin cancer classification model applies the Over Sampling, technique to data with a high distribution ratio, and adds a pre-learning model without an Attention layer. This model is then compared to the model without the Attention layer. We also plan to solve the data imbalance problem by strengthening data augmentation techniques for specific classes.

Method for Assessing Landslide Susceptibility Using SMOTE and Classification Algorithms (SMOTE와 분류 기법을 활용한 산사태 위험 지역 결정 방법)

  • Yoon, Hyung-Koo
    • Journal of the Korean Geotechnical Society
    • /
    • v.39 no.6
    • /
    • pp.5-12
    • /
    • 2023
  • Proactive assessment of landslide susceptibility is necessary for minimizing casualties. This study proposes a methodology for classifying the landslide safety factor using a classification algorithm based on machine learning techniques. The high-risk area model is adopted to perform the classification and eight geotechnical parameters are adopted as inputs. Four classification algorithms-namely decision tree, k-nearest neighbor, logistic regression, and random forest-are employed for comparing classification accuracy for the safety factors ranging between 1.2 and 2.0. Notably, a high accuracy is demonstrated in the safety factor range of 1.2~1.7, but a relatively low accuracy is obtained in the range of 1.8~2.0. To overcome this issue, the synthetic minority over-sampling technique (SMOTE) is adopted to generate additional data. The application of SMOTE improves the average accuracy by ~250% in the safety factor range of 1.8~2.0. The results demonstrate that SMOTE algorithm improves the accuracy of classification algorithms when applied to geotechnical data.

An Analysis on Classification Retrieval Operation in University Libraries (대학도서관의 분류검색 운영 분석)

  • Lee Jong-Moon
    • Journal of Korean Library and Information Science Society
    • /
    • v.36 no.2
    • /
    • pp.165-178
    • /
    • 2005
  • This study aims to identify the status of the classification retrieval operation by investigating and analyzing the classification retrieval related to the books in the university libraries. The Investigation concentrated on whether the classification retrieval service is provided, Access Method and classification retrieval level. The data was collected from 97 libraries where URL access was available during the period of survey in 100 libraries selected by the systematic sampling. As a result, while $92.8\%$ of 97 libraries provided the classification retrieval service, $52.2\%$ of it enabled the access to classification retrieval service only by the classification number and $47.8\%$ by classification number and classification directory. Consequently, it was found that the retrieval environment in the libraries where the access was enabled only by classification number should be urgently improved for the activation of classification retrieval.

  • PDF

A Hybrid Under-sampling Approach for Better Bankruptcy Prediction (부도예측 개선을 위한 하이브리드 언더샘플링 접근법)

  • Kim, Taehoon;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.173-190
    • /
    • 2015
  • The purpose of this study is to improve bankruptcy prediction models by using a novel hybrid under-sampling approach. Most prior studies have tried to enhance the accuracy of bankruptcy prediction models by improving the classification methods involved. In contrast, we focus on appropriate data preprocessing as a means of enhancing accuracy. In particular, we aim to develop an effective sampling approach for bankruptcy prediction, since most prediction models suffer from class imbalance problems. The approach proposed in this study is a hybrid under-sampling method that combines the k-Reverse Nearest Neighbor (k-RNN) and one-class support vector machine (OCSVM) approaches. k-RNN can effectively eliminate outliers, while OCSVM contributes to the selection of informative training samples from majority class data. To validate our proposed approach, we have applied it to data from H Bank's non-external auditing companies in Korea, and compared the performances of the classifiers with the proposed under-sampling and random sampling data. The empirical results show that the proposed under-sampling approach generally improves the accuracy of classifiers, such as logistic regression, discriminant analysis, decision tree, and support vector machines. They also show that the proposed under-sampling approach reduces the risk of false negative errors, which lead to higher misclassification costs.