• Title/Summary/Keyword: 의사결정나무알고리즘

Search Result 106, Processing Time 0.028 seconds

Building battery deterioration prediction model using real field data (머신러닝 기법을 이용한 납축전지 열화 예측 모델 개발)

  • Choi, Keunho;Kim, Gunwoo
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.243-264
    • /
    • 2018
  • Although the worldwide battery market is recently spurring the development of lithium secondary battery, lead acid batteries (rechargeable batteries) which have good-performance and can be reused are consumed in a wide range of industry fields. However, lead-acid batteries have a serious problem in that deterioration of a battery makes progress quickly in the presence of that degradation of only one cell among several cells which is packed in a battery begins. To overcome this problem, previous researches have attempted to identify the mechanism of deterioration of a battery in many ways. However, most of previous researches have used data obtained in a laboratory to analyze the mechanism of deterioration of a battery but not used data obtained in a real world. The usage of real data can increase the feasibility and the applicability of the findings of a research. Therefore, this study aims to develop a model which predicts the battery deterioration using data obtained in real world. To this end, we collected data which presents change of battery state by attaching sensors enabling to monitor the battery condition in real time to dozens of golf carts operated in the real golf field. As a result, total 16,883 samples were obtained. And then, we developed a model which predicts a precursor phenomenon representing deterioration of a battery by analyzing the data collected from the sensors using machine learning techniques. As initial independent variables, we used 1) inbound time of a cart, 2) outbound time of a cart, 3) duration(from outbound time to charge time), 4) charge amount, 5) used amount, 6) charge efficiency, 7) lowest temperature of battery cell 1 to 6, 8) lowest voltage of battery cell 1 to 6, 9) highest voltage of battery cell 1 to 6, 10) voltage of battery cell 1 to 6 at the beginning of operation, 11) voltage of battery cell 1 to 6 at the end of charge, 12) used amount of battery cell 1 to 6 during operation, 13) used amount of battery during operation(Max-Min), 14) duration of battery use, and 15) highest current during operation. Since the values of the independent variables, lowest temperature of battery cell 1 to 6, lowest voltage of battery cell 1 to 6, highest voltage of battery cell 1 to 6, voltage of battery cell 1 to 6 at the beginning of operation, voltage of battery cell 1 to 6 at the end of charge, and used amount of battery cell 1 to 6 during operation are similar to that of each battery cell, we conducted principal component analysis using verimax orthogonal rotation in order to mitigate the multiple collinearity problem. According to the results, we made new variables by averaging the values of independent variables clustered together, and used them as final independent variables instead of origin variables, thereby reducing the dimension. We used decision tree, logistic regression, Bayesian network as algorithms for building prediction models. And also, we built prediction models using the bagging of each of them, the boosting of each of them, and RandomForest. Experimental results show that the prediction model using the bagging of decision tree yields the best accuracy of 89.3923%. This study has some limitations in that the additional variables which affect the deterioration of battery such as weather (temperature, humidity) and driving habits, did not considered, therefore, we would like to consider the them in the future research. However, the battery deterioration prediction model proposed in the present study is expected to enable effective and efficient management of battery used in the real filed by dramatically and to reduce the cost caused by not detecting battery deterioration accordingly.

Rough Set Analysis for Stock Market Timing (러프집합분석을 이용한 매매시점 결정)

  • Huh, Jin-Nyung;Kim, Kyoung-Jae;Han, In-Goo
    • Journal of Intelligence and Information Systems
    • /
    • v.16 no.3
    • /
    • pp.77-97
    • /
    • 2010
  • Market timing is an investment strategy which is used for obtaining excessive return from financial market. In general, detection of market timing means determining when to buy and sell to get excess return from trading. In many market timing systems, trading rules have been used as an engine to generate signals for trade. On the other hand, some researchers proposed the rough set analysis as a proper tool for market timing because it does not generate a signal for trade when the pattern of the market is uncertain by using the control function. The data for the rough set analysis should be discretized of numeric value because the rough set only accepts categorical data for analysis. Discretization searches for proper "cuts" for numeric data that determine intervals. All values that lie within each interval are transformed into same value. In general, there are four methods for data discretization in rough set analysis including equal frequency scaling, expert's knowledge-based discretization, minimum entropy scaling, and na$\ddot{i}$ve and Boolean reasoning-based discretization. Equal frequency scaling fixes a number of intervals and examines the histogram of each variable, then determines cuts so that approximately the same number of samples fall into each of the intervals. Expert's knowledge-based discretization determines cuts according to knowledge of domain experts through literature review or interview with experts. Minimum entropy scaling implements the algorithm based on recursively partitioning the value set of each variable so that a local measure of entropy is optimized. Na$\ddot{i}$ve and Booleanreasoning-based discretization searches categorical values by using Na$\ddot{i}$ve scaling the data, then finds the optimized dicretization thresholds through Boolean reasoning. Although the rough set analysis is promising for market timing, there is little research on the impact of the various data discretization methods on performance from trading using the rough set analysis. In this study, we compare stock market timing models using rough set analysis with various data discretization methods. The research data used in this study are the KOSPI 200 from May 1996 to October 1998. KOSPI 200 is the underlying index of the KOSPI 200 futures which is the first derivative instrument in the Korean stock market. The KOSPI 200 is a market value weighted index which consists of 200 stocks selected by criteria on liquidity and their status in corresponding industry including manufacturing, construction, communication, electricity and gas, distribution and services, and financing. The total number of samples is 660 trading days. In addition, this study uses popular technical indicators as independent variables. The experimental results show that the most profitable method for the training sample is the na$\ddot{i}$ve and Boolean reasoning but the expert's knowledge-based discretization is the most profitable method for the validation sample. In addition, the expert's knowledge-based discretization produced robust performance for both of training and validation sample. We also compared rough set analysis and decision tree. This study experimented C4.5 for the comparison purpose. The results show that rough set analysis with expert's knowledge-based discretization produced more profitable rules than C4.5.

An Optimized Combination of π-fuzzy Logic and Support Vector Machine for Stock Market Prediction (주식 시장 예측을 위한 π-퍼지 논리와 SVM의 최적 결합)

  • Dao, Tuanhung;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.4
    • /
    • pp.43-58
    • /
    • 2014
  • As the use of trading systems has increased rapidly, many researchers have become interested in developing effective stock market prediction models using artificial intelligence techniques. Stock market prediction involves multifaceted interactions between market-controlling factors and unknown random processes. A successful stock prediction model achieves the most accurate result from minimum input data with the least complex model. In this research, we develop a combination model of ${\pi}$-fuzzy logic and support vector machine (SVM) models, using a genetic algorithm to optimize the parameters of the SVM and ${\pi}$-fuzzy functions, as well as feature subset selection to improve the performance of stock market prediction. To evaluate the performance of our proposed model, we compare the performance of our model to other comparative models, including the logistic regression, multiple discriminant analysis, classification and regression tree, artificial neural network, SVM, and fuzzy SVM models, with the same data. The results show that our model outperforms all other comparative models in prediction accuracy as well as return on investment.

Machine-learning Approaches with Multi-temporal Remotely Sensed Data for Estimation of Forest Biomass and Forest Reference Emission Levels (시계열 위성영상과 머신러닝 기법을 이용한 산림 바이오매스 및 배출기준선 추정)

  • Yong-Kyu, Lee;Jung-Soo, Lee
    • Journal of Korean Society of Forest Science
    • /
    • v.111 no.4
    • /
    • pp.603-612
    • /
    • 2022
  • The study aims were to evaluate a machine-learning, algorithm-based, forest biomass-estimation model to estimate subnational forest biomass and to comparatively analyze REDD+ forest reference emission levels. Time-series Landsat satellite imagery and ESA Biomass Climate Change Initiative information were used to build a machine-learning-based biomass estimation model. The k-nearest neighbors algorithm (kNN), which is a non-parametric learning model, and the tree-based random forest (RF) model were applied to the machine-learning algorithm, and the estimated biomasses were compared with the forest reference emission levels (FREL) data, which was provided by the Paraguayan government. The root mean square error (RMSE), which was the optimum parameter of the kNN model, was 35.9, and the RMSE of the RF model was lower at 34.41, showing that the RF model was superior. As a result of separately using the FREL, kNN, and RF methods to set the reference emission levels, the gradient was set to approximately -33,000 tons, -253,000 tons, and -92,000 tons, respectively. These results showed that the machine learning-based estimation model was more suitable than the existing methods for setting reference emission levels.

A Study on the Fraud Detection for Electronic Prepayment using Machine Learning (머신러닝을 이용한 선불전자지급수단의 이상금융거래 탐지 연구)

  • Choi, Byung-Ho;Cho, Nam-Wook
    • The Journal of Society for e-Business Studies
    • /
    • v.27 no.2
    • /
    • pp.65-77
    • /
    • 2022
  • Due to the recent development in electronic financial services, transactions of electronic prepayment are rapidly growing, leading to growing fraud attempts. This paper proposes a methodology that can effectively detect fraud transactions in electronic prepayment by machine learning algorithms, including support vector machines, decision trees, and artificial neural networks. Actual transaction data of electronic prepayment services were collected and preprocessed to extract the most relevant variables from raw data. Two different approaches were explored in the paper. One is a transaction-based approach, and the other is a user ID-based approach. For the transaction-based approach, the first model is primarily based on raw data features, while the second model uses extra features in addition to the first model. The user ID-based approach also used feature engineering to extract and transform the most relevant features. Overall, the user ID-based approach showed a better performance than the transaction-based approach, where the artificial neural networks showed the best performance. The proposed method could be used to reduce the damage caused by financial accidents by detecting and blocking fraud attempts.

A study on the prediction of korean NPL market return (한국 NPL시장 수익률 예측에 관한 연구)

  • Lee, Hyeon Su;Jeong, Seung Hwan;Oh, Kyong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.2
    • /
    • pp.123-139
    • /
    • 2019
  • The Korean NPL market was formed by the government and foreign capital shortly after the 1997 IMF crisis. However, this market is short-lived, as the bad debt has started to increase after the global financial crisis in 2009 due to the real economic recession. NPL has become a major investment in the market in recent years when the domestic capital market's investment capital began to enter the NPL market in earnest. Although the domestic NPL market has received considerable attention due to the overheating of the NPL market in recent years, research on the NPL market has been abrupt since the history of capital market investment in the domestic NPL market is short. In addition, decision-making through more scientific and systematic analysis is required due to the decline in profitability and the price fluctuation due to the fluctuation of the real estate business. In this study, we propose a prediction model that can determine the achievement of the benchmark yield by using the NPL market related data in accordance with the market demand. In order to build the model, we used Korean NPL data from December 2013 to December 2017 for about 4 years. The total number of things data was 2291. As independent variables, only the variables related to the dependent variable were selected for the 11 variables that indicate the characteristics of the real estate. In order to select the variables, one to one t-test and logistic regression stepwise and decision tree were performed. Seven independent variables (purchase year, SPC (Special Purpose Company), municipality, appraisal value, purchase cost, OPB (Outstanding Principle Balance), HP (Holding Period)). The dependent variable is a bivariate variable that indicates whether the benchmark rate is reached. This is because the accuracy of the model predicting the binomial variables is higher than the model predicting the continuous variables, and the accuracy of these models is directly related to the effectiveness of the model. In addition, in the case of a special purpose company, whether or not to purchase the property is the main concern. Therefore, whether or not to achieve a certain level of return is enough to make a decision. For the dependent variable, we constructed and compared the predictive model by calculating the dependent variable by adjusting the numerical value to ascertain whether 12%, which is the standard rate of return used in the industry, is a meaningful reference value. As a result, it was found that the hit ratio average of the predictive model constructed using the dependent variable calculated by the 12% standard rate of return was the best at 64.60%. In order to propose an optimal prediction model based on the determined dependent variables and 7 independent variables, we construct a prediction model by applying the five methodologies of discriminant analysis, logistic regression analysis, decision tree, artificial neural network, and genetic algorithm linear model we tried to compare them. To do this, 10 sets of training data and testing data were extracted using 10 fold validation method. After building the model using this data, the hit ratio of each set was averaged and the performance was compared. As a result, the hit ratio average of prediction models constructed by using discriminant analysis, logistic regression model, decision tree, artificial neural network, and genetic algorithm linear model were 64.40%, 65.12%, 63.54%, 67.40%, and 60.51%, respectively. It was confirmed that the model using the artificial neural network is the best. Through this study, it is proved that it is effective to utilize 7 independent variables and artificial neural network prediction model in the future NPL market. The proposed model predicts that the 12% return of new things will be achieved beforehand, which will help the special purpose companies make investment decisions. Furthermore, we anticipate that the NPL market will be liquidated as the transaction proceeds at an appropriate price.

Forecasting of Customer's Purchasing Intention Using Support Vector Machine (Support Vector Machine 기법을 이용한 고객의 구매의도 예측)

  • Kim, Jin-Hwa;Nam, Ki-Chan;Lee, Sang-Jong
    • Information Systems Review
    • /
    • v.10 no.2
    • /
    • pp.137-158
    • /
    • 2008
  • Rapid development of various information technologies creates new opportunities in online and offline markets. In this changing market environment, customers have various demands on new products and services. Therefore, their power and influence on the markets grow stronger each year. Companies have paid great attention to customer relationship management. Especially, personalized product recommendation systems, which recommend products and services based on customer's private information or purchasing behaviors in stores, is an important asset to most companies. CRM is one of the important business processes where reliable information is mined from customer database. Data mining techniques such as artificial intelligence are popular tools used to extract useful information and knowledge from these customer databases. In this research, we propose a recommendation system that predicts customer's purchase intention. Then, customer's purchasing intention of specific product is predicted by using data mining techniques using receipt data set. The performance of this suggested method is compared with that of other data mining technologies.

Development of Prediction Model of Financial Distress and Improvement of Prediction Performance Using Data Mining Techniques (데이터마이닝 기법을 이용한 기업부실화 예측 모델 개발과 예측 성능 향상에 관한 연구)

  • Kim, Raynghyung;Yoo, Donghee;Kim, Gunwoo
    • Information Systems Review
    • /
    • v.18 no.2
    • /
    • pp.173-198
    • /
    • 2016
  • Financial distress can damage stakeholders and even lead to significant social costs. Thus, financial distress prediction is an important issue in macroeconomics. However, most existing studies on building a financial distress prediction model have only considered idiosyncratic risk factors without considering systematic risk factors. In this study, we propose a prediction model that considers both the idiosyncratic risk based on a financial ratio and the systematic risk based on a business cycle. Ultimately, we build several IT artifacts associated with financial ratio and add them to the idiosyncratic risk factors as well as address the imbalanced data problem by using an oversampling technique and synthetic minority oversampling technique (SMOTE) to ensure good performance. When considering systematic risk, our study ensures that each data set consists of both financially distressed companies and financially sound companies in each business cycle phase. We conducted several experiments that change the initial imbalanced sample ratio between the two company groups into a 1:1 sample ratio using SMOTE and compared the prediction results from the individual data set. We also predicted data sets from the subsequent business cycle phase as a test set through a built prediction model that used business contraction phase data sets, and then we compared previous prediction performance and subsequent prediction performance. Thus, our findings can provide insights into making rational decisions for stakeholders that are experiencing an economic crisis.

Data analysis by Integrating statistics and visualization: Visual verification for the prediction model (통계와 시각화를 결합한 데이터 분석: 예측모형 대한 시각화 검증)

  • Mun, Seong Min;Lee, Kyung Won
    • Design Convergence Study
    • /
    • v.15 no.6
    • /
    • pp.195-214
    • /
    • 2016
  • Predictive analysis is based on a probabilistic learning algorithm called pattern recognition or machine learning. Therefore, if users want to extract more information from the data, they are required high statistical knowledge. In addition, it is difficult to find out data pattern and characteristics of the data. This study conducted statistical data analyses and visual data analyses to supplement prediction analysis's weakness. Through this study, we could find some implications that haven't been found in the previous studies. First, we could find data pattern when adjust data selection according as splitting criteria for the decision tree method. Second, we could find what type of data included in the final prediction model. We found some implications that haven't been found in the previous studies from the results of statistical and visual analyses. In statistical analysis we found relation among the multivariable and deducted prediction model to predict high box office performance. In visualization analysis we proposed visual analysis method with various interactive functions. Finally through this study we verified final prediction model and suggested analysis method extract variety of information from the data.

Developing Library Tour Course Recommendation Model based on a Traveler Persona: Focused on facilities and routes for library trips in J City (여행자 페르소나 기반 도서관 여행 코스 추천 모델 개발 - J시 도서관 여행을 위한 시설 및 동선 중심으로 -)

  • Suhyeon Lee;Hyunsoo Kim;Jiwon Baek;Hyo-Jung Oh
    • Journal of Korean Library and Information Science Society
    • /
    • v.54 no.2
    • /
    • pp.23-42
    • /
    • 2023
  • The library tour program is a new type of cultural program that was first introduced and operated by J City, and library tourists travel to specialized libraries in the city according to a set course and experience various experiences. This study aims to build a customized course recommendation model that considers the characteristics of individual participants in addition to the existing fixed group travel format so that more users can enjoy the opportunity to participate in library tours. To this end, the characteristics of library travelers were categorized to establish traveler personas, and library evaluation items and evaluation criteria were established accordingly. We selected 22 libraries targeted by the library travel program and measured library data through actual visits. Based on the collected data, we derived the characteristics of suitable libraries and developed a persona-based library tour course recommendation model using a decision tree algorithm. To demonstrate the feasibility of the proposed recommendation model, we build a mobile application mockup, and conducted user evaluations with actual library users to identify satisfaction and improvements to the developed model.