Search | Korea Science

A Hybrid Clustering Technique for Processing Large Data (대용량 데이터 처리를 위한 하이브리드형 클러스터링 기법)

Kim, Man-Sun;Lee, Sang-Yong
- The KIPS Transactions:PartB
- /
- v.10B no.1
- /
- pp.33-40
- /
- 2003
Data mining plays an important role in a knowledge discovery process and various algorithms of data mining can be selected for the specific purpose. Most of traditional hierachical clustering methode are suitable for processing small data sets, so they difficulties in handling large data sets because of limited resources and insufficient efficiency. In this study we propose a hybrid neural networks clustering technique, called PPC for Pre-Post Clustering that can be applied to large data sets and find unknown patterns. PPC combinds an artificial intelligence method, SOM and a statistical method, hierarchical clustering technique, and clusters data through two processes. In pre-clustering process, PPC digests large data sets using SOM. Then in post-clustering, PPC measures Similarity values according to cohesive distances which show inner features, and adjacent distances which show external distances between clusters. At last PPC clusters large data sets using the simularity values. Experiment with UCI repository data showed that PPC had better cohensive values than the other clustering techniques.
https://doi.org/10.3745/KIPSTB.2003.10B.1.033 인용 PDF KSCI

Courses Recommendation Algorithm Based On Performance Prediction In E-Learning

Koffi, Dagou Dangui Augustin Sylvain Legrand;Ouattara, Nouho;Mambe, Digrais Moise;Oumtanaga, Souleymane;ADJE, Assohoun
- International Journal of Computer Science & Network Security
- /
- v.21 no.2
- /
- pp.148-157
- /
- 2021
The effectiveness of recommendation systems depends on the performance of the algorithms with which these systems are designed. The quality of the algorithms themselves depends on the quality of the strategies with which they were designed. These strategies differ from author to author. Thus, designing a good recommendation system means implementing the good strategies. It's in this context that several research works have been proposed on various strategies applied to algorithms to meet the needs of recommendations. Researchers are trying indefinitely to address this objective of seeking the qualities of recommendation algorithms. In this paper, we propose a new algorithm for recommending learning items. Learner performance predictions and collaborative recommendation methods are used as strategies for this algorithm. The proposed performance prediction model is based on convolutional neural networks (CNN). The results of the performance predictions are used by the proposed recommendation algorithm. The results of the predictions obtained show the efficiency of Deep Learning compared to the k-nearest neighbor (k-NN) algorithm. The proposed recommendation algorithm improves the recommendations of the learners' learning items. This algorithm also has the particularity of dissuading learning items in the learner's profile that are deemed inadequate for his or her training.
https://doi.org/10.22937/IJCSNS.2021.21.2.17 인용 PDF KSCI

Ensemble Learning with Support Vector Machines for Bond Rating (회사채 신용등급 예측을 위한 SVM 앙상블학습)

Kim, Myoung-Jong
- Journal of Intelligence and Information Systems
- /
- v.18 no.2
- /
- pp.29-45
- /
- 2012
Bond rating is regarded as an important event for measuring financial risk of companies and for determining the investment returns of investors. As a result, it has been a popular research topic for researchers to predict companies' credit ratings by applying statistical and machine learning techniques. The statistical techniques, including multiple regression, multiple discriminant analysis (MDA), logistic models (LOGIT), and probit analysis, have been traditionally used in bond rating. However, one major drawback is that it should be based on strict assumptions. Such strict assumptions include linearity, normality, independence among predictor variables and pre-existing functional forms relating the criterion variablesand the predictor variables. Those strict assumptions of traditional statistics have limited their application to the real world. Machine learning techniques also used in bond rating prediction models include decision trees (DT), neural networks (NN), and Support Vector Machine (SVM). Especially, SVM is recognized as a new and promising classification and regression analysis method. SVM learns a separating hyperplane that can maximize the margin between two categories. SVM is simple enough to be analyzed mathematical, and leads to high performance in practical applications. SVM implements the structuralrisk minimization principle and searches to minimize an upper bound of the generalization error. In addition, the solution of SVM may be a global optimum and thus, overfitting is unlikely to occur with SVM. In addition, SVM does not require too many data sample for training since it builds prediction models by only using some representative sample near the boundaries called support vectors. A number of experimental researches have indicated that SVM has been successfully applied in a variety of pattern recognition fields. However, there are three major drawbacks that can be potential causes for degrading SVM's performance. First, SVM is originally proposed for solving binary-class classification problems. Methods for combining SVMs for multi-class classification such as One-Against-One, One-Against-All have been proposed, but they do not improve the performance in multi-class classification problem as much as SVM for binary-class classification. Second, approximation algorithms (e.g. decomposition methods, sequential minimal optimization algorithm) could be used for effective multi-class computation to reduce computation time, but it could deteriorate classification performance. Third, the difficulty in multi-class prediction problems is in data imbalance problem that can occur when the number of instances in one class greatly outnumbers the number of instances in the other class. Such data sets often cause a default classifier to be built due to skewed boundary and thus the reduction in the classification accuracy of such a classifier. SVM ensemble learning is one of machine learning methods to cope with the above drawbacks. Ensemble learning is a method for improving the performance of classification and prediction algorithms. AdaBoost is one of the widely used ensemble learning techniques. It constructs a composite classifier by sequentially training classifiers while increasing weight on the misclassified observations through iterations. The observations that are incorrectly predicted by previous classifiers are chosen more often than examples that are correctly predicted. Thus Boosting attempts to produce new classifiers that are better able to predict examples for which the current ensemble's performance is poor. In this way, it can reinforce the training of the misclassified observations of the minority class. This paper proposes a multiclass Geometric Mean-based Boosting (MGM-Boost) to resolve multiclass prediction problem. Since MGM-Boost introduces the notion of geometric mean into AdaBoost, it can perform learning process considering the geometric mean-based accuracy and errors of multiclass. This study applies MGM-Boost to the real-world bond rating case for Korean companies to examine the feasibility of MGM-Boost. 10-fold cross validations for threetimes with different random seeds are performed in order to ensure that the comparison among three different classifiers does not happen by chance. For each of 10-fold cross validation, the entire data set is first partitioned into tenequal-sized sets, and then each set is in turn used as the test set while the classifier trains on the other nine sets. That is, cross-validated folds have been tested independently of each algorithm. Through these steps, we have obtained the results for classifiers on each of the 30 experiments. In the comparison of arithmetic mean-based prediction accuracy between individual classifiers, MGM-Boost (52.95%) shows higher prediction accuracy than both AdaBoost (51.69%) and SVM (49.47%). MGM-Boost (28.12%) also shows the higher prediction accuracy than AdaBoost (24.65%) and SVM (15.42%)in terms of geometric mean-based prediction accuracy. T-test is used to examine whether the performance of each classifiers for 30 folds is significantly different. The results indicate that performance of MGM-Boost is significantly different from AdaBoost and SVM classifiers at 1% level. These results mean that MGM-Boost can provide robust and stable solutions to multi-classproblems such as bond rating.
https://doi.org/10.13088/jiis.2012.18.2.029 인용 PDF KSCI

Search Result 153, Processing Time 0.019 seconds

A Hybrid Clustering Technique for Processing Large Data (대용량 데이터 처리를 위한 하이브리드형 클러스터링 기법)

Courses Recommendation Algorithm Based On Performance Prediction In E-Learning

Ensemble Learning with Support Vector Machines for Bond Rating (회사채 신용등급 예측을 위한 SVM 앙상블학습)

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)