Search | Korea Science

Hybrid Learning Architectures for Advanced Data Mining:An Application to Binary Classification for Fraud Management (개선된 데이터마이닝을 위한 혼합 학습구조의 제시)

Kim, Steven H.;Shin, Sung-Woo
- Journal of Information Technology Application
- /
- v.1
- /
- pp.173-211
- /
- 1999
The task of classification permeates all walks of life, from business and economics to science and public policy. In this context, nonlinear techniques from artificial intelligence have often proven to be more effective than the methods of classical statistics. The objective of knowledge discovery and data mining is to support decision making through the effective use of information. The automated approach to knowledge discovery is especially useful when dealing with large data sets or complex relationships. For many applications, automated software may find subtle patterns which escape the notice of manual analysis, or whose complexity exceeds the cognitive capabilities of humans. This paper explores the utility of a collaborative learning approach involving integrated models in the preprocessing and postprocessing stages. For instance, a genetic algorithm effects feature-weight optimization in a preprocessing module. Moreover, an inductive tree, artificial neural network (ANN), and k-nearest neighbor (kNN) techniques serve as postprocessing modules. More specifically, the postprocessors act as second0order classifiers which determine the best first-order classifier on a case-by-case basis. In addition to the second-order models, a voting scheme is investigated as a simple, but efficient, postprocessing model. The first-order models consist of statistical and machine learning models such as logistic regression (logit), multivariate discriminant analysis (MDA), ANN, and kNN. The genetic algorithm, inductive decision tree, and voting scheme act as kernel modules for collaborative learning. These ideas are explored against the background of a practical application relating to financial fraud management which exemplifies a binary classification problem.
PDF

Software Fault Prediction at Design Phase

Singh, Pradeep;Verma, Shrish;Vyas, O.P.
- Journal of Electrical Engineering and Technology
- /
- v.9 no.5
- /
- pp.1739-1745
- /
- 2014
Prediction of fault-prone modules continues to attract researcher's interest due to its significant impact on software development cost. The most important goal of such techniques is to correctly identify the modules where faults are most likely to present in early phases of software development lifecycle. Various software metrics related to modules level fault data have been successfully used for prediction of fault-prone modules. Goal of this research is to predict the faulty modules at design phase using design metrics of modules and faults related to modules. We have analyzed the effect of pre-processing and different machine learning schemes on eleven projects from NASA Metrics Data Program which offers design metrics and its related faults. Using seven machine learning and four preprocessing techniques we confirmed that models built from design metrics are surprisingly good at fault proneness prediction. The result shows that we should choose Naïve Bayes or Voting feature intervals with discretization for different data sets as they outperformed out of 28 schemes. Naive Bayes and Voting feature intervals has performed AUC > 0.7 on average of eleven projects. Our proposed framework is effective and can predict an acceptable level of fault at design phases.
https://doi.org/10.5370/JEET.2014.9.5.1739 인용 PDF KSCI KPUBS HTML

A partially occluded object recognition technique using a probabilistic analysis in the feature space (특징 공간상에서 의 확률적 해석에 기반한 부분 인식 기법에 관한 연구)

박보건;이경무;이상욱;이진학
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.26 no.11A
- /
- pp.1946-1956
- /
- 2001
In this paper, we propose a novel 2-D partial matching algorithm based on model-based stochastic analysis of feature correspondences in a relation vector space, which is quite robust to shape variations as well as invariant to geometric transformations. We represent an object using the ARG (Attributed Relational Graph) model with features of a set of relation vectors. In addition, we statistically model the partial occlusion or noise as the distortion of the relation vector distribution in the relation vector space. Our partial matching algorithm consists of two-phases. First, a finite number of candidate sets areselected by using logical constraint embedding local and structural consistency Second, the feature loss detection is done iteratively by error detection and voting scheme thorough the error analysis of relation vector space. Experimental results on real images demonstrate that the proposed algorithm is quite robust to noise and localize target objects correctly even inseverely noisy and occluded scenes.
PDF

Performance Analysis of Modified LLAH Algorithm under Gaussian Noise (가우시안 잡음에서 변형된 LLAH 알고리즘의 성능 분석)

Ryu, Hosub;Park, Hanhoon
- Journal of Korea Multimedia Society
- /
- v.18 no.8
- /
- pp.901-908
- /
- 2015
Methods of detecting, describing, matching image features, like corners and blobs, have been actively studied as a fundamental step for image processing and computer vision applications. As one of feature description/matching methods, LLAH(Locally Likely Arrangement Hashing) describes image features based on the geometric relationship between their neighbors, and thus is suitable for scenes with poor texture. This paper presents a modified LLAH algorithm, which includes the image features themselves for robustly describing the geometric relationship unlike the original LLAH, and employes a voting-based feature matching scheme that makes feature description much simpler. Then, this paper quantitatively analyzes its performance with synthetic images in the presence of Gaussian noise.
https://doi.org/10.9717/kmms.2015.18.8.901 인용 PDF KSCI KPUBS HTML

Sequence driven features for prediction of subcellular localization of proteins

Kim, Jong-Kyoung;Bang, Sung-Yang;Choi, Seung-Jin
- Proceedings of the Korean Society for Bioinformatics Conference
- /
- 2005.09a
- /
- pp.237-242
- /
- 2005
Predicting the cellular location of an unknown protein gives a valuable information for inferring the possible function of the protein. For more accurate prediction system, we need a good feature extraction method that transforms the raw sequence data into the numerical feature vector, minimizing information loss. In this paper, we propose new methods of extracting underlying features only from the sequence data by computing pairwise sequence alignment scores. In addition, we use composition based features to improve prediction accuracy. To construct an SVM ensemble from separately trained SVM classifiers, we propose specificity based weighted majority voting. The overall prediction accuracy evaluated by the 5-fold cross-validation reached 88.53% for the eukaryotic animal data set. By comparing the prediction accuracy of various feature extraction methods, we could get the biological insight on the location of targeting information. Our numerical experiments confirm that our new feature extraction methods are very useful for predicting subcellular localization of proteins.
PDF

Text-independent Speaker Identification Using Soft Bag-of-Words Feature Representation

Jiang, Shuangshuang;Frigui, Hichem;Calhoun, Aaron W.
- International Journal of Fuzzy Logic and Intelligent Systems
- /
- v.14 no.4
- /
- pp.240-248
- /
- 2014
We present a robust speaker identification algorithm that uses novel features based on soft bag-of-word representation and a simple Naive Bayes classifier. The bag-of-words (BoW) based histogram feature descriptor is typically constructed by summarizing and identifying representative prototypes from low-level spectral features extracted from training data. In this paper, we define a generalization of the standard BoW. In particular, we define three types of BoW that are based on crisp voting, fuzzy memberships, and possibilistic memberships. We analyze our mapping with three common classifiers: Naive Bayes classifier (NB); K-nearest neighbor classifier (KNN); and support vector machines (SVM). The proposed algorithms are evaluated using large datasets that simulate medical crises. We show that the proposed soft bag-of-words feature representation approach achieves a significant improvement when compared to the state-of-art methods.
https://doi.org/10.5391/IJFIS.2014.14.4.240 인용 PDF KSCI KPUBS HTML

Development of Galaxy Image Classification Based on Hand-crafted Features and Machine Learning (Hand-crafted 특징 및 머신 러닝 기반의 은하 이미지 분류 기법 개발)

Oh, Yoonju;Jung, Heechul
- IEMEK Journal of Embedded Systems and Applications
- /
- v.16 no.1
- /
- pp.17-27
- /
- 2021
In this paper, we develop a galaxy image classification method based on hand-crafted features and machine learning techniques. Additionally, we provide an empirical analysis to reveal which combination of the techniques is effective for galaxy image classification. To achieve this, we developed a framework which consists of four modules such as preprocessing, feature extraction, feature post-processing, and classification. Finally, we found that the best technique for galaxy image classification is a method to use a median filter, ORB vector features and a voting classifier based on RBF SVM, random forest and logistic regression. The final method is efficient so we believe that it is applicable to embedded environments.
https://doi.org/10.14372/IEMEK.2021.16.1.17 인용 PDF KSCI

Histogram Equalization Using Centroids of Fuzzy C-Means of Background Speakers' Utterances for Majority Voting Based Speaker Identification (다수 투표 기반의 화자 식별을 위한 배경 화자 데이터의 퍼지 C-Means 중심을 이용한 히스토그램 등화기법)

Kim, Myung-Jae;Yang, Il-Ho;Yu, Ha-Jin
- The Journal of the Acoustical Society of Korea
- /
- v.33 no.1
- /
- pp.68-74
- /
- 2014
In a previous work, we proposed a novel approach of histogram equalization using a supplement set which is composed of centroids of Fuzzy C-Means of the background utterances. The performance of the proposed method is affected by the size of the supplement set, but it is difficult to find the best size at the point of recognition. In this paper, we propose a histogram equalization using a supplement set for majority voting based speaker identification. The proposed method identifies test utterances using a majority voting on the histogram equalization methods with various sizes of supplement sets. The proposed method is compared with the conventional feature normalization methods such as CMN(Cepstral Mean Normalization), MVN(Mean and Variance Normalization), and HEQ(Histogram Equalization) and the histogram equalization method using a supplement set.
https://doi.org/10.7776/ASK.2014.33.1.068 인용 PDF KSCI

Sequence driven features for prediction of subcellular localization of proteins (단백질의 세포내 소 기관별 분포 예측을 위한 서열 기반의 특징 추출 방법)

Kim, Jong-Kyoung;Choi, Seung-Jin
- Proceedings of the Korean Information Science Society Conference
- /
- 2005.07b
- /
- pp.226-228
- /
- 2005
Predicting the cellular location of an unknown protein gives valuable information for inferring the possible function of the protein. For more accurate Prediction system, we need a good feature extraction method that transforms the raw sequence data into the numerical feature vector, minimizing information loss. In this paper we propose new methods of extracting underlying features only from the sequence data by computing pairwise sequence alignment scores. In addition, we use composition based features to improve prediction accuracy. To construct an SVM ensemble from separately trained SVM classifiers, we propose specificity based weighted majority voting . The overall prediction accuracy evaluated by the 5-fold cross-validation reached $88.53\%$ for the eukaryotic animal data set. By comparing the prediction accuracy of various feature extraction methods, we could get the biological insight on the location of targeting information. Our numerical experiments confirm that our new feature extraction methods are very useful forpredicting subcellular localization of proteins.
PDF

Modeling and Selecting Optimal Features for Machine Learning Based Detections of Android Malwares (머신러닝 기반 안드로이드 모바일 악성 앱의 최적 특징점 선정 및 모델링 방안 제안)

Lee, Kye Woong;Oh, Seung Taek;Yoon, Young
- KIPS Transactions on Software and Data Engineering
- /
- v.8 no.11
- /
- pp.427-432
- /
- 2019
In this paper, we propose three approaches to modeling Android malware. The first method involves human security experts for meticulously selecting feature sets. With the second approach, we choose 300 features with the highest importance among the top 99% features in terms of occurrence rate. The third approach is to combine multiple models and identify malware through weighted voting. In addition, we applied a novel method of eliminating permission information which used to be regarded as a critical factor for distinguishing malware. With our carefully generated feature sets and the weighted voting by the ensemble algorithm, we were able to reach the highest malware detection accuracy of 97.8%. We also verified that discarding the permission information lead to the improvement in terms of false positive and false negative rates.
https://doi.org/10.3745/KTSDE.2019.8.11.427 인용 PDF KSCI

Search Result 45, Processing Time 0.02 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)