• 제목/요약/키워드: Classification Algorithms

검색결과 1,195건 처리시간 0.032초

Detecting Malicious Social Robots with Generative Adversarial Networks

  • Wu, Bin;Liu, Le;Dai, Zhengge;Wang, Xiujuan;Zheng, Kangfeng
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제13권11호
    • /
    • pp.5594-5615
    • /
    • 2019
  • Malicious social robots, which are disseminators of malicious information on social networks, seriously affect information security and network environments. The detection of malicious social robots is a hot topic and a significant concern for researchers. A method based on classification has been widely used for social robot detection. However, this method of classification is limited by an unbalanced data set in which legitimate, negative samples outnumber malicious robots (positive samples), which leads to unsatisfactory detection results. This paper proposes the use of generative adversarial networks (GANs) to extend the unbalanced data sets before training classifiers to improve the detection of social robots. Five popular oversampling algorithms were compared in the experiments, and the effects of imbalance degree and the expansion ratio of the original data on oversampling were studied. The experimental results showed that the proposed method achieved better detection performance compared with other algorithms in terms of the F1 measure. The GAN method also performed well when the imbalance degree was smaller than 15%.

Evaluation of Robust Classifier Algorithm for Tissue Classification under Various Noise Levels

  • Youn, Su Hyun;Shin, Ki Young;Choi, Ahnryul;Mun, Joung Hwan
    • ETRI Journal
    • /
    • 제39권1호
    • /
    • pp.87-96
    • /
    • 2017
  • Ultrasonic surgical devices are routinely used for surgical procedures. The incision and coagulation of tissue generate a temperature of $40^{\circ}C-150^{\circ}C$ and depend on the controllable output power level of the surgical device. Recently, research on the classification of grasped tissues to automatically control the power level was published. However, this research did not consider the specific characteristics of the surgical device, tissue denaturalization, and so on. Therefore, this research proposes a robust algorithm that simulates noise to resemble real situations and classifies tissue using conventional classifier algorithms. In this research, the bioimpedance spectrum for six tissues (liver, large intestine, kidney, lung, muscle, and fat) is measured, and five classifier algorithms are used. A signal-to-noise ratio of additive white Gaussian noise diversifies the testing sets, and as a result, each classifier's performance exhibits a difference. The k-nearest neighbors algorithm shows the highest classification rate of 92.09% (p < 0.01) and a standard deviation of 1.92%, which confirms high reproducibility.

사례 선택 기법을 활용한 앙상블 모형의 성능 개선 (Improving an Ensemble Model Using Instance Selection Method)

  • 민성환
    • 산업경영시스템학회지
    • /
    • 제39권1호
    • /
    • pp.105-115
    • /
    • 2016
  • Ensemble classification involves combining individually trained classifiers to yield more accurate prediction, compared with individual models. Ensemble techniques are very useful for improving the generalization ability of classifiers. The random subspace ensemble technique is a simple but effective method for constructing ensemble classifiers; it involves randomly drawing some of the features from each classifier in the ensemble. The instance selection technique involves selecting critical instances while deleting and removing irrelevant and noisy instances from the original dataset. The instance selection and random subspace methods are both well known in the field of data mining and have proven to be very effective in many applications. However, few studies have focused on integrating the instance selection and random subspace methods. Therefore, this study proposed a new hybrid ensemble model that integrates instance selection and random subspace techniques using genetic algorithms (GAs) to improve the performance of a random subspace ensemble model. GAs are used to select optimal (or near optimal) instances, which are used as input data for the random subspace ensemble model. The proposed model was applied to both Kaggle credit data and corporate credit data, and the results were compared with those of other models to investigate performance in terms of classification accuracy, levels of diversity, and average classification rates of base classifiers in the ensemble. The experimental results demonstrated that the proposed model outperformed other models including the single model, the instance selection model, and the original random subspace ensemble model.

Truncated Kernel Projection Machine for Link Prediction

  • Huang, Liang;Li, Ruixuan;Chen, Hong
    • Journal of Computing Science and Engineering
    • /
    • 제10권2호
    • /
    • pp.58-67
    • /
    • 2016
  • With the large amount of complex network data that is increasingly available on the Web, link prediction has become a popular data-mining research field. The focus of this paper is on a link-prediction task that can be formulated as a binary classification problem in complex networks. To solve this link-prediction problem, a sparse-classification algorithm called "Truncated Kernel Projection Machine" that is based on empirical-feature selection is proposed. The proposed algorithm is a novel way to achieve a realization of sparse empirical-feature-based learning that is different from those of the regularized kernel-projection machines. The algorithm is more appealing than those of the previous outstanding learning machines since it can be computed efficiently, and it is also implemented easily and stably during the link-prediction task. The algorithm is applied here for link-prediction tasks in different complex networks, and an investigation of several classification algorithms was performed for comparison. The experimental results show that the proposed algorithm outperformed the compared algorithms in several key indices with a smaller number of test errors and greater stability.

A Classification Algorithm Based on Data Clustering and Data Reduction for Intrusion Detection System over Big Data

  • Wang, Qiuhua;Ouyang, Xiaoqin;Zhan, Jiacheng
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제13권7호
    • /
    • pp.3714-3732
    • /
    • 2019
  • With the rapid development of network, Intrusion Detection System(IDS) plays a more and more important role in network applications. Many data mining algorithms are used to build IDS. However, due to the advent of big data era, massive data are generated. When dealing with large-scale data sets, most data mining algorithms suffer from a high computational burden which makes IDS much less efficient. To build an efficient IDS over big data, we propose a classification algorithm based on data clustering and data reduction. In the training stage, the training data are divided into clusters with similar size by Mini Batch K-Means algorithm, meanwhile, the center of each cluster is used as its index. Then, we select representative instances for each cluster to perform the task of data reduction and use the clusters that consist of representative instances to build a K-Nearest Neighbor(KNN) detection model. In the detection stage, we sort clusters according to the distances between the test sample and cluster indexes, and obtain k nearest clusters where we find k nearest neighbors. Experimental results show that searching neighbors by cluster indexes reduces the computational complexity significantly, and classification with reduced data of representative instances not only improves the efficiency, but also maintains high accuracy.

Performance Comparison of Machine Learning Algorithms for Received Signal Strength-Based Indoor LOS/NLOS Classification of LTE Signals

  • Lee, Halim;Seo, Jiwon
    • Journal of Positioning, Navigation, and Timing
    • /
    • 제11권4호
    • /
    • pp.361-368
    • /
    • 2022
  • An indoor navigation system that utilizes long-term evolution (LTE) signals has the benefit of no additional infrastructure installation expenses and low base station database management costs. Among the LTE signal measurements, received signal strength (RSS) is particularly appealing because it can be easily obtained with mobile devices. Propagation channel models can be used to estimate the position of mobile devices with RSS. However, conventional channel models have a shortcoming in that they do not discriminate between line-of-sight (LOS) and non-line-of-sight (NLOS) conditions of the received signal. Accordingly, a previous study has suggested separated LOS and NLOS channel models. However, a method for determining LOS and NLOS conditions was not devised. In this study, a machine learning-based LOS/NLOS classification method using RSS measurements is developed. We suggest several machine-learning features and evaluate various machine-learning algorithms. As an indoor experimental result, up to 87.5% classification accuracy was achieved with an ensemble algorithm. Furthermore, the range estimation accuracy with an average error of 13.54 m was demonstrated, which is a 25.3% improvement over the conventional channel model.

A Binary Classifier Using Fully Connected Neural Network for Alzheimer's Disease Classification

  • Prajapati, Rukesh;Kwon, Goo-Rak
    • Journal of Multimedia Information System
    • /
    • 제9권1호
    • /
    • pp.21-32
    • /
    • 2022
  • Early-stage diagnosis of Alzheimer's Disease (AD) from Cognitively Normal (CN) patients is crucial because treatment at an early stage of AD can prevent further progress in the AD's severity in the future. Recently, computer-aided diagnosis using magnetic resonance image (MRI) has shown better performance in the classification of AD. However, these methods use a traditional machine learning algorithm that requires supervision and uses a combination of many complicated processes. In recent research, the performance of deep neural networks has outperformed the traditional machine learning algorithms. The ability to learn from the data and extract features on its own makes the neural networks less prone to errors. In this paper, a dense neural network is designed for binary classification of Alzheimer's disease. To create a classifier with better results, we studied result of different activation functions in the prediction. We obtained results from 5-folds validations with combinations of different activation functions and compared with each other, and the one with the best validation score is used to classify the test data. In this experiment, features used to train the model are obtained from the ADNI database after processing them using FreeSurfer software. For 5-folds validation, two groups: AD and CN are classified. The proposed DNN obtained better accuracy than the traditional machine learning algorithms and the compared previous studies for AD vs. CN, AD vs. Mild Cognitive Impairment (MCI), and MCI vs. CN classifications, respectively. This neural network is robust and better.

후두음성 질환에 대한 인공지능 연구 (Artificial Intelligence for Clinical Research in Voice Disease)

  • 석준걸;권택균
    • 대한후두음성언어의학회지
    • /
    • 제33권3호
    • /
    • pp.142-155
    • /
    • 2022
  • Diagnosis using voice is non-invasive and can be implemented through various voice recording devices; therefore, it can be used as a screening or diagnostic assistant tool for laryngeal voice disease to help clinicians. The development of artificial intelligence algorithms, such as machine learning, led by the latest deep learning technology, began with a binary classification that distinguishes normal and pathological voices; consequently, it has contributed in improving the accuracy of multi-classification to classify various types of pathological voices. However, no conclusions that can be applied in the clinical field have yet been achieved. Most studies on pathological speech classification using speech have used the continuous short vowel /ah/, which is relatively easier than using continuous or running speech. However, continuous speech has the potential to derive more accurate results as additional information can be obtained from the change in the voice signal over time. In this review, explanations of terms related to artificial intelligence research, and the latest trends in machine learning and deep learning algorithms are reviewed; furthermore, the latest research results and limitations are introduced to provide future directions for researchers.

A Novel Classification Model for Employees Turnover Using Neural Network for Enhancing Job Satisfaction in Organizations

  • Tarig Mohamed Ahmed
    • International Journal of Computer Science & Network Security
    • /
    • 제23권7호
    • /
    • pp.71-78
    • /
    • 2023
  • Employee turnover is one of the most important challenges facing modern organizations. It causes job experiences and skills such as distinguished faculty members in universities, rare-specialized doctors, innovative engineers, and senior administrators. HR analytics has enhanced the area of data analytics to an extent that institutions can figure out their employees' characteristics; where inaccuracy leads to incorrect decision making. This paper aims to develop a novel model that can help decision-makers to classify the problem of Employee Turnover. By using feature selection methods: Information Gain and Chi-Square, the most important four features have been extracted from the dataset. These features are over time, job level, salary, and years in the organization. As one of the important results of this research, these features should be planned carefully to keep organizations their employees as valuable assets. The proposed model based on machine learning algorithms. Classification algorithms were used to implement the model such as Decision Tree, SVM, Random Frost, Neuronal Network, and Naive Bayes. The model was trained and tested by using a dataset that consists of 1470 records and 25 features. To develop the research model, many experiments had been conducted to find the best one. Based on implementation results, the Neural Network algorithm is selected as the best one with an Accuracy of 84 percents and AUC (ROC) 74 percents. By validation mechanism, the model is acceptable and reliable to help origination decision-makers to manage their employees in a good manner.

가우시안 분포의 다중클래스 데이터에 대한 최적 피춰추출 방법 (Optimal feature extraction for normally distributed multicall data)

  • 최의선;이철희
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 1998년도 추계종합학술대회 논문집
    • /
    • pp.1263-1266
    • /
    • 1998
  • In this paper, we propose an optimal feature extraction method for normally distributed multiclass data. We search the whole feature space to find a set of features that give the smallest classification error for the Gaussian ML classifier. Initially, we start with an arbitrary feature vector. Assuming that the feature vector is used for classification, we compute the classification error. Then we move the feature vector slightly and compute the classification error with this vector. Finally we update the feature vector such that the classification error decreases most rapidly. This procedure is done by taking gradient. Alternatively, the initial vector can be those found by conventional feature extraction algorithms. We propose two search methods, sequential search and global search. Experiment results show that the proposed method compares favorably with the conventional feature extraction methods.

  • PDF