Search | Korea Science

Classification and Analysis of Data Mining Algorithms (데이터마이닝 알고리즘의 분류 및 분석)

Lee, Jung-Won;Kim, Ho-Sook;Choi, Ji-Young;Kim, Hyon-Hee;Yong, Hwan-Seung;Lee, Sang-Ho;Park, Seung-Soo
- Journal of KIISE:Databases
- /
- v.28 no.3
- /
- pp.279-300
- /
- 2001
Data mining plays an important role in knowledge discovery process and usually various existing algorithms are selected for the specific purpose of the mining. Currently, data mining techniques are actively to the statistics, business, electronic commerce, biology, and medical area and currently numerous algorithms are being researched and developed for these applications. However, in a long run, only a few algorithms, which are well-suited to specific applications with excellent performance in large database, will survive. So it is reasonable to focus our effort on those selected algorithms in the future. This paper classifies about 30 existing algorithms into 7 categories - association rule, clustering, neural network, decision tree, genetic algorithm, memory-based reasoning, and bayesian network. First of all, this work analyzes systematic hierarchy and characteristics of algorithms and we present 14 criteria for classifying the algorithms and the results based on this criteria. Finally, we propose the best algorithms among some comparable algorithms with different features and performances. The result of this paper can be used as a guideline for data mining researches as well as field applications of data mining.
PDF

Investigating Opinion Mining Performance by Combining Feature Selection Methods with Word Embedding and BOW (Bag-of-Words) (속성선택방법과 워드임베딩 및 BOW (Bag-of-Words)를 결합한 오피니언 마이닝 성과에 관한 연구)

Eo, Kyun Sun;Lee, Kun Chang
- Journal of Digital Convergence
- /
- v.17 no.2
- /
- pp.163-170
- /
- 2019
Over the past decade, the development of the Web explosively increased the data. Feature selection step is an important step in extracting valuable data from a large amount of data. This study proposes a novel opinion mining model based on combining feature selection (FS) methods with Word embedding to vector (Word2vec) and BOW (Bag-of-words). FS methods adopted for this study are CFS (Correlation based FS) and IG (Information Gain). To select an optimal FS method, a number of classifiers ranging from LR (logistic regression), NN (neural network), NBN (naive Bayesian network) to RF (random forest), RS (random subspace), ST (stacking). Empirical results with electronics and kitchen datasets showed that LR and ST classifiers combined with IG applied to BOW features yield best performance in opinion mining. Results with laptop and restaurant datasets revealed that the RF classifier using IG applied to Word2vec features represents best performance in opinion mining.
https://doi.org/10.14400/JDC.2019.17.2.163 인용 PDF KSCI HTML

Effective Eye Detection for Face Recognition to Protect Medical Information (의료정보 보호를 위해 얼굴인식에 필요한 효과적인 시선 검출)

Kim, Suk-Il;Seok, Gyeong-Hyu
- The Journal of the Korea institute of electronic communication sciences
- /
- v.12 no.5
- /
- pp.923-932
- /
- 2017
In this paper, we propose a GRNN(: Generalized Regression Neural Network) algorithms for new eyes and face recognition identification system to solve the points that need corrective action in accordance with the existing problems of facial movements gaze upon it difficult to identify the user and. Using a Kalman filter structural information elements of a face feature to determine the authenticity of the face was estimated future location using the location information of the current head and the treatment time is relatively fast horizontal and vertical elements of the face using a histogram analysis the detected. And the light obtained by configuring the infrared illuminator pupil effects in real-time detection of the pupil, the pupil tracking was to extract the text print vector. The abstract is to be in fully-justified italicized text as it is here, below the author information.
https://doi.org/10.13067/JKIECS.2017.12.5.923 인용 PDF KSCI

Genetic Algorithm based hyperparameter tuned CNN for identifying IoT intrusions

Alexander. R;Pradeep Mohan Kumar. K
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.18 no.3
- /
- pp.755-778
- /
- 2024
In recent years, the number of devices being connected to the internet has grown enormously, as has the intrusive behavior in the network. Thus, it is important for intrusion detection systems to report all intrusive behavior. Using deep learning and machine learning algorithms, intrusion detection systems are able to perform well in identifying attacks. However, the concern with these deep learning algorithms is their inability to identify a suitable network based on traffic volume, which requires manual changing of hyperparameters, which consumes a lot of time and effort. So, to address this, this paper offers a solution using the extended compact genetic algorithm for the automatic tuning of the hyperparameters. The novelty in this work comes in the form of modeling the problem of identifying attacks as a multi-objective optimization problem and the usage of linkage learning for solving the optimization problem. The solution is obtained using the feature map-based Convolutional Neural Network that gets encoded into genes, and using the extended compact genetic algorithm the model is optimized for the detection accuracy and latency. The CIC-IDS-2017 and 2018 datasets are used to verify the hypothesis, and the most recent analysis yielded a substantial F1 score of 99.23%. Response time, CPU, and memory consumption evaluations are done to demonstrate the suitability of this model in a fog environment.
https://doi.org/10.3837/tiis.2024.03.013 인용 PDF HTML

Hybrid GA-ANN and PSO-ANN methods for accurate prediction of uniaxial compression capacity of CFDST columns

Quang-Viet Vu;Sawekchai Tangaramvong;Thu Huynh Van;George Papazafeiropoulos
- Steel and Composite Structures
- /
- v.47 no.6
- /
- pp.759-779
- /
- 2023
The paper proposes two hybrid metaheuristic optimization and artificial neural network (ANN) methods for the close prediction of the ultimate axial compressive capacity of concentrically loaded concrete filled double skin steel tube (CFDST) columns. Two metaheuristic optimization, namely genetic algorithm (GA) and particle swarm optimization (PSO), approaches enable the dynamic training architecture underlying an ANN model by optimizing the number and sizes of hidden layers as well as the weights and biases of the neurons, simultaneously. The former is termed as GA-ANN, and the latter as PSO-ANN. These techniques utilize the gradient-based optimization with Bayesian regularization that enhances the optimization process. The proposed GA-ANN and PSO-ANN methods construct the predictive ANNs from 125 available experimental datasets and present the superior performance over standard ANNs. Both the hybrid GA-ANN and PSO-ANN methods are encoded within a user-friendly graphical interface that can reliably map out the accurate ultimate axial compressive capacity of CFDST columns with various geometry and material parameters.
https://doi.org/10.12989/scs.2023.47.6.759 인용

A Keyword Matching for the Retrieval of Low-Quality Hangul Document Images

Na, In-Seop;Park, Sang-Cheol;Kim, Soo-Hyung
- Journal of the Korean Society for Library and Information Science
- /
- v.47 no.1
- /
- pp.39-55
- /
- 2013
It is a difficult problem to use keyword retrieval for low-quality Korean document images because these include adjacent characters that are connected. In addition, images that are created from various fonts are likely to be distorted during acquisition. In this paper, we propose and test a keyword retrieval system, using a support vector machine (SVM) for the retrieval of low-quality Korean document images. We propose a keyword retrieval method using an SVM to discriminate the similarity between two word images. We demonstrated that the proposed keyword retrieval method is more effective than the accumulated Optical Character Recognition (OCR)-based searching method. Moreover, using the SVM is better than Bayesian decision or artificial neural network for determining the similarity of two images.
https://doi.org/10.4275/KSLIS.2013.47.1.039 인용 PDF KSCI

Visual Object Tracking based on Real-time Particle Filters

Lee, Dong- Hun;Jo, Yong-Gun;Kang, Hoon
- 제어로봇시스템학회:학술대회논문집
- /
- 2005.06a
- /
- pp.1524-1529
- /
- 2005
Particle filter is a kind of conditional density propagation model. Its similar characteristics to both selection and mutation operator of evolutionary strategy (ES) due to its Bayesian inference rule structure, shows better performance than any other tracking algorithms. When a new object is entering the region of interest, particle filter sets which have been swarming around the existing objects have to move and track the new one instantaneously. Moreover, there is another problem that it could not track multiple objects well if they were moving away from each other after having been overlapped. To resolve reinitialization problem, we use competitive-AVQ algorithm of neural network. And we regard interfarme difference (IFD) of background images as potential field and give priority to the particles according to this IFD to track multiple objects independently. In this paper, we showed that the possibility of real-time object tracking as intelligent interfaces by simulating the deformable contour particle filters.
PDF

Predicting Stock Liquidity by Using Ensemble Data Mining Methods

Bae, Eun Chan;Lee, Kun Chang
- Journal of the Korea Society of Computer and Information
- /
- v.21 no.6
- /
- pp.9-19
- /
- 2016
In finance literature, stock liquidity showing how stocks can be cashed out in the market has received rich attentions from both academicians and practitioners. The reasons are plenty. First, it is known that stock liquidity affects significantly asset pricing. Second, macroeconomic announcements influence liquidity in the stock market. Therefore, stock liquidity itself affects investors' decision and managers' decision as well. Though there exist a great deal of literature about stock liquidity in finance literature, it is quite clear that there are no studies attempting to investigate the stock liquidity issue as one of decision making problems. In finance literature, most of stock liquidity studies had dealt with limited views such as how much it influences stock price, which variables are associated with describing the stock liquidity significantly, etc. However, this paper posits that stock liquidity issue may become a serious decision-making problem, and then be handled by using data mining techniques to estimate its future extent with statistical validity. In this sense, we collected financial data set from a number of manufacturing companies listed in KRX (Korea Exchange) during the period of 2010 to 2013. The reason why we selected dataset from 2010 was to avoid the after-shocks of financial crisis that occurred in 2008. We used Fn-GuidPro system to gather total 5,700 financial data set. Stock liquidity measure was computed by the procedures proposed by Amihud (2002) which is known to show best metrics for showing relationship with daily return. We applied five data mining techniques (or classifiers) such as Bayesian network, support vector machine (SVM), decision tree, neural network, and ensemble method. Bayesian networks include GBN (General Bayesian Network), NBN (Naive BN), TAN (Tree Augmented NBN). Decision tree uses CART and C4.5. Regression result was used as a benchmarking performance. Ensemble method uses two types-integration of two classifiers, and three classifiers. Ensemble method is based on voting for the sake of integrating classifiers. Among the single classifiers, CART showed best performance with 48.2%, compared with 37.18% by regression. Among the ensemble methods, the result from integrating TAN, CART, and SVM was best with 49.25%. Through the additional analysis in individual industries, those relatively stabilized industries like electronic appliances, wholesale & retailing, woods, leather-bags-shoes showed better performance over 50%.
https://doi.org/10.9708/jksci.2016.21.6.009 인용 PDF KSCI

Hybrid Prediction Model for Self-Healing System (자가치유 시스템을 위한 하이브리드 예측모델)

Yoo, Gil-Jong;Park, Jeong-Min;Jung, Chul-Ho;Lee, Eun-Seok
- 한국HCI학회:학술대회논문집
- /
- 2006.02a
- /
- pp.381-386
- /
- 2006
오늘날 분산 컴퓨팅 환경에서 운용되는 시스템이 증가함에 따라 시스템의 관리작업은 고수준(high-level)의 자동화에 대한 요구가 증가하고 있다. 이에 따라 시스템 관리방식이 전통적인 관리자 중심의 방식에서 시스템 스스로가 자신의 문제를 인식하고 상황을 분석하여 해결하는 자율 컴퓨팅 방식으로 변화하고 있으며, 이에 대한 연구가 많은 연구기관에서 다양한 방법으로 이루어지고 있다. 그러나 이러한 대부분의 기존 연구들은 문제가 발생한 이후의 치유에 주로 초점이 맞추어져 있다. 이러한 문제를 해결하기 위해서는 시스템 스스로가 동작환경을 인식하고 에러의 발생을 예측하기 위한 예측 모델이 필요하다. 따라서 본 논문에서는 자율 컴퓨팅환경에서 자가 치유를 지원하는 4가지의 예측 모델 설계 방법을 제안한다. 본 예측 모델은 ID3 알고리즘, 퍼지 추론, 퍼지 뉴럴 네트워크 그리고 베이지안 네트워크가 각 시스템 상황에 맞춰 적절하게 사용되는 방식이며, 이를 통해 보다 정확한 에러 예측이 가능해진다. 우리는 제안모델의 평가를 위해 본 예측모델을 자가치유 시스템에 적용하여 기존 연구와 예측의 효율을 비교하였으며, 그 결과를 통해 제안 모델의 유효성을 증명하였다.
PDF

A Review of Machine Learning Algorithms for Fraud Detection in Credit Card Transaction

Lim, Kha Shing;Lee, Lam Hong;Sim, Yee-Wai
- International Journal of Computer Science & Network Security
- /
- v.21 no.9
- /
- pp.31-40
- /
- 2021
The increasing number of credit card fraud cases has become a considerable problem since the past decades. This phenomenon is due to the expansion of new technologies, including the increased popularity and volume of online banking transactions and e-commerce. In order to address the problem of credit card fraud detection, a rule-based approach has been widely utilized to detect and guard against fraudulent activities. However, it requires huge computational power and high complexity in defining and building the rule base for pattern matching, in order to precisely identifying the fraud patterns. In addition, it does not come with intelligence and ability in predicting or analysing transaction data in looking for new fraud patterns and strategies. As such, Data Mining and Machine Learning algorithms are proposed to overcome the shortcomings in this paper. The aim of this paper is to highlight the important techniques and methodologies that are employed in fraud detection, while at the same time focusing on the existing literature. Methods such as Artificial Neural Networks (ANNs), Support Vector Machines (SVMs), naïve Bayesian, k-Nearest Neighbour (k-NN), Decision Tree and Frequent Pattern Mining algorithms are reviewed and evaluated for their performance in detecting fraudulent transaction.
https://doi.org/10.22937/IJCSNS.2021.21.9.4 인용 PDF KSCI

Search Result 133, Processing Time 0.025 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)