• Title/Summary/Keyword: WEKA

Search Result 57, Processing Time 0.034 seconds

Error Forecasting Using Linear Regression Model

  • Ler, Lian Guey;Kim, Byung-Sik;Choi, Gye-Woon;Kang, Byung-Hwa;Kwang, Jung-Jae
    • Journal of Wetlands Research
    • /
    • v.13 no.1
    • /
    • pp.13-23
    • /
    • 2011
  • In this study, Mike11 will be used as the numerical model where a data assimilation method will be applied to it. This paper aims to gain an insight and understanding of data assimilation in flood forecasting models. It will start with a general discussion of data assimilation, followed by a description of the methodology and discussion of the statistical error forecast model used, which in this case is the linear regression. This error forecast model is applied to the water level forecast simulated by MIKE11 to produced improved forecast and validated against real measurements. It is found that there exists a phase error in the improved forecasts. Hence, 2 general formula are used to account for this phase error and they have shown improvement to the accuracy of the forecasts, where one improved the immediate forecast of up to 5 hours while the other improved the estimation of the peak discharge.

Adopting and Implementation of Decision Tree Classification Method for Image Interpolation (이미지 보간을 위한 의사결정나무 분류 기법의 적용 및 구현)

  • Kim, Donghyung
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.16 no.1
    • /
    • pp.55-65
    • /
    • 2020
  • With the development of display hardware, image interpolation techniques have been used in various fields such as image zooming and medical imaging. Traditional image interpolation methods, such as bi-linear interpolation, bi-cubic interpolation and edge direction-based interpolation, perform interpolation in the spatial domain. Recently, interpolation techniques in the discrete cosine transform or wavelet domain are also proposed. Using these various existing interpolation methods and machine learning, we propose decision tree classification-based image interpolation methods. In other words, this paper is about the method of adaptively applying various existing interpolation methods, not the interpolation method itself. To obtain the decision model, we used Weka's J48 library with the C4.5 decision tree algorithm. The proposed method first constructs attribute set and select classes that means interpolation methods for classification model. And after training, interpolation is performed using different interpolation methods according to attributes characteristics. Simulation results show that the proposed method yields reasonable performance.

Predicting stock price direction by using data mining methods : Emphasis on comparing single classifiers and ensemble classifiers

  • Eo, Kyun Sun;Lee, Kun Chang
    • Journal of the Korea Society of Computer and Information
    • /
    • v.22 no.11
    • /
    • pp.111-116
    • /
    • 2017
  • This paper proposes a data mining approach to predicting stock price direction. Stock market fluctuates due to many factors. Therefore, predicting stock price direction has become an important issue in the field of stock market analysis. However, in literature, there are few studies applying data mining approaches to predicting the stock price direction. To contribute to literature, this paper proposes comparing single classifiers and ensemble classifiers. Single classifiers include logistic regression, decision tree, neural network, and support vector machine. Ensemble classifiers we consider are adaboost, random forest, bagging, stacking, and vote. For the sake of experiments, we garnered dataset from Korea Stock Exchange (KRX) ranging from 2008 to 2015. Data mining experiments using WEKA revealed that random forest, one of ensemble classifiers, shows best results in terms of metrics such as AUC (area under the ROC curve) and accuracy.

Machine Learning Based Keyphrase Extraction: Comparing Decision Trees, Naïve Bayes, and Artificial Neural Networks

  • Sarkar, Kamal;Nasipuri, Mita;Ghose, Suranjan
    • Journal of Information Processing Systems
    • /
    • v.8 no.4
    • /
    • pp.693-712
    • /
    • 2012
  • The paper presents three machine learning based keyphrase extraction methods that respectively use Decision Trees, Na$\ddot{i}$ve Bayes, and Artificial Neural Networks for keyphrase extraction. We consider keyphrases as being phrases that consist of one or more words and as representing the important concepts in a text document. The three machine learning based keyphrase extraction methods that we use for experimentation have been compared with a publicly available keyphrase extraction system called KEA. The experimental results show that the Neural Network based keyphrase extraction method outperforms two other keyphrase extraction methods that use the Decision Tree and Na$\ddot{i}$ve Bayes. The results also show that the Neural Network based method performs better than KEA.

Device identification Based on Audio Source (음원을 이용한 기기판별)

  • Yi, Myeong-Hwan;Moon, Chang-Bae;Kim, Byeong-Man
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2012.06c
    • /
    • pp.224-226
    • /
    • 2012
  • IT 기술의 발전과 정보화 사회로 인해 컴퓨터 관련범죄뿐 아니라 일반 범죄에서도 증거 및 단서가 디지털정보 기기에 보관되는 경우가 발생하고 있다. 이러한 맥락에서 본 논문에서는 디지털 포렌식 기술의 하나로서 녹음 데이터로부터 녹음기기를 판별하는 효과적인 방법을 제안한다. 녹음된 데이터에서 노이즈를 추출하고, 이 노이즈의 차이점을 이용하면 효율적인 기기판별 방법이 가능해진다. 본 논문에서는 위너 필터를 통한 기기 Noise를 추출하고, MirToolBox를 이용하여 특징들을 추출한다. 추출된 특징들과 WEKA의 다중 신경망을 이용하여 학습 및 판별하였다. 판별 결과 평균 99.8%의 성능을 보였다.

Design of Black Plastics Classifier Using Data Information (데이터 정보를 이용한 흑색 플라스틱 분류기 설계)

  • Park, Sang-Beom;Oh, Sung-Kwun
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.67 no.4
    • /
    • pp.569-577
    • /
    • 2018
  • In this paper, with the aid of information which is included within data, preprocessing algorithm-based black plastic classifier is designed. The slope and area of spectrum obtained by using laser induced breakdown spectroscopy(LIBS) are analyzed for each material and its ensuing information is applied as the input data of the proposed classifier. The slope is represented by the rate of change of wavelength and intensity. Also, the area is calculated by the wavelength of the spectrum peak where the material property of chemical elements such as carbon and hydrogen appears. Using informations such as slope and area, input data of the proposed classifier is constructed. In the preprocessing part of the classifier, Principal Component Analysis(PCA) and fuzzy transform are used for dimensional reduction from high dimensional input variables to low dimensional input variables. Characteristic analysis of the materials as well as the processing speed of the classifier is improved. In the condition part, FCM clustering is applied and linear function is used as connection weight in the conclusion part. By means of Particle Swarm Optimization(PSO), parameters such as the number of clusters, fuzzification coefficient and the number of input variables are optimized. To demonstrate the superiority of classification performance, classification rate is compared by using WEKA 3.8 data mining software which contains various classifiers such as Naivebayes, SVM and Multilayer perceptron.

Set Covering-based Feature Selection of Large-scale Omics Data (Set Covering 기반의 대용량 오믹스데이터 특징변수 추출기법)

  • Ma, Zhengyu;Yan, Kedong;Kim, Kwangsoo;Ryoo, Hong Seo
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.39 no.4
    • /
    • pp.75-84
    • /
    • 2014
  • In this paper, we dealt with feature selection problem of large-scale and high-dimensional biological data such as omics data. For this problem, most of the previous approaches used simple score function to reduce the number of original variables and selected features from the small number of remained variables. In the case of methods that do not rely on filtering techniques, they do not consider the interactions between the variables, or generate approximate solutions to the simplified problem. Unlike them, by combining set covering and clustering techniques, we developed a new method that could deal with total number of variables and consider the combinatorial effects of variables for selecting good features. To demonstrate the efficacy and effectiveness of the method, we downloaded gene expression datasets from TCGA (The Cancer Genome Atlas) and compared our method with other algorithms including WEKA embeded feature selection algorithms. In the experimental results, we showed that our method could select high quality features for constructing more accurate classifiers than other feature selection algorithms.

A Study on the Data Mining Preprocessing Tool For Efficient Database Marketing (효율적인 데이터베이스 마케팅을 위한 데이터마이닝 전처리도구에 관한 연구)

  • Lee, Jun-Seok
    • Journal of Digital Convergence
    • /
    • v.12 no.11
    • /
    • pp.257-264
    • /
    • 2014
  • This paper is to construction of the data mining preprocessing tool for efficient database marketing. We compare and evaluate the often used data mining tools based on the access method to local and remote databases, and on the exchange of information resources between different computers. The evaluated preprocessing of data mining tools are Answer Tree, Climentine, Enterprise Miner, Kensington, and Weka. We propose a design principle for an efficient system for data preprocessing for data mining on the distributed networks. This system is based on Java technology including EJB(Enterprise Java Beans) and XML(eXtensible Markup Language).

A Comparative Study on the Performance of Intrusion Detection using Decision Tree and Artificial Neural Network Models (의사결정트리와 인공 신경망 기법을 이용한 침입탐지 효율성 비교 연구)

  • Jo, Seongrae;Sung, Haengnam;Ahn, Byunghyuk
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.11 no.4
    • /
    • pp.33-45
    • /
    • 2015
  • Currently, Internet is used an essential tool in the business area. Despite this importance, there is a risk of network attacks attempting collection of fraudulence, private information, and cyber terrorism. Firewalls and IDS(Intrusion Detection System) are tools against those attacks. IDS is used to determine whether a network data is a network attack. IDS analyzes the network data using various techniques including expert system, data mining, and state transition analysis. This paper tries to compare the performance of two data mining models in detecting network attacks. They are decision tree (C4.5), and neural network (FANN model). I trained and tested these models with data and measured the effectiveness in terms of detection accuracy, detection rate, and false alarm rate. This paper tries to find out which model is effective in intrusion detection. In the analysis, I used KDD Cup 99 data which is a benchmark data in intrusion detection research. I used an open source Weka software for C4.5 model, and C++ code available for FANN model.

Basket ball motion recognition using a 3-axis accelerometer sensor of smart phone (스마트폰의 3축 가속도 센서를 이용한 농구 자세 인식)

  • Ho, Jong-Gab;Lee, Sang-Jun;Wang, Chang-Won;Jung, Hwa-Yung;Na, Ye-Ji;Min, Se-dong
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2015.10a
    • /
    • pp.1372-1374
    • /
    • 2015
  • 본 논문에서는 농구 경기에서의 대표적 자세 중 Standing shoot, Jump shoot, Pass, Dribble, Lay-up shoot, 총 5가지 자세를 인식하기 위해 각 자세와 3축 가속도 값과의 상관관계를 보여주고 있다. 스마트폰에 내장되어 있는 가속도 센서로부터 데이터를 생성해주는 어플리케이션인 Sensor log를 활용하여 얻은 3축 가속도 값으로 수직, 수평축과 3축 가속도의 크기를 구해 Instance로 사용하였다. 위 데이터는 대표적인 데이터 마이닝 도구인 Weka tool을 이용하여 각 모션과 데이터 값의 상관관계를 확인하였고, 실험 결과 10-fold에서 평균 59.8%를 보였으나 Training set과 Test set의 결과 80.8%를 보였다.