• Title/Summary/Keyword: tree-based models

Search Result 437, Processing Time 0.067 seconds

Data Mining based Forest Fires Prediction Models using Meteorological Data (기상 데이터를 이용한 데이터 마이닝 기반의 산불 예측 모델)

  • Kim, Sam-Keun;Ahn, Jae-Geun
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.21 no.8
    • /
    • pp.521-529
    • /
    • 2020
  • Forest fires are one of the most important environmental risks that have adverse effects on many aspects of life, such as the economy, environment, and health. The early detection, quick prediction, and rapid response of forest fires can play an essential role in saving property and life from forest fire risks. For the rapid discovery of forest fires, there is a method using meteorological data obtained from local sensors installed in each area by the Meteorological Agency. Meteorological conditions (e.g., temperature, wind) influence forest fires. This study evaluated a Data Mining (DM) approach to predict the burned area of forest fires. Five DM models, e.g., Stochastic Gradient Descent (SGD), Support Vector Machines (SVM), Decision Tree (DT), Random Forests (RF), and Deep Neural Network (DNN), and four feature selection setups (using spatial, temporal, and weather attributes), were tested on recent real-world data collected from Gyeonggi-do area over the last five years. As a result of the experiment, a DNN model using only meteorological data showed the best performance. The proposed model was more effective in predicting the burned area of small forest fires, which are more frequent. This knowledge derived from the proposed prediction model is particularly useful for improving firefighting resource management.

Developing Dynamic DBH Growth Prediction Model by Thinning Intensity and Cycle - Based on Yield Table Data - (간벌강도 및 주기에 따른 동적 흉고직경 생장예측 모형개발 - 기존 수확표 자료를 기반으로 -)

  • Kim, Moonil;Lee, Woo-Kyun;Park, Taejin;Kwak, Hanbin;Byun, Jungyeon;Nam, Kijun;Lee, Kyung-Hak;Son, Yung-Mo;Won, Hyung-Kyu;Lee, Sang-Min
    • Journal of Korean Society of Forest Science
    • /
    • v.101 no.2
    • /
    • pp.266-278
    • /
    • 2012
  • The objective of this study was developing dynamic stand growth model to predict diameter at breast height (DBH) growth by thinning intensity and cycle for major tree species of South Korea. The yield table, one of static stand growth models, constructed by Korea Forest Service was employed to prepare dynamic stand growth models for 8 tree species. In the process of model development, the thinning type was designated to thinning from below and equations for predicting the DBH change after thinning by different intensities was generated. In addition, stand density (N/ha), age and site index were adopted as explanatory variables for DBH prediction model. Thereafter, using the model, DBH growth under various silvicuture through integrating such equations considering thinning intensities, and cycles. The dynamic stand growth model of DBH developed in this study can provide understanding of effectiveness in forest growth and growing stock when thinning practice is performed in forest. Furthermore, results of this study is also applicable to quantitatively assess the carbon storage sequestration capability.

Development of the Accident Prediction Model for Enlisted Men through an Integrated Approach to Datamining and Textmining (데이터 마이닝과 텍스트 마이닝의 통합적 접근을 통한 병사 사고예측 모델 개발)

  • Yoon, Seungjin;Kim, Suhwan;Shin, Kyungshik
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.3
    • /
    • pp.1-17
    • /
    • 2015
  • In this paper, we report what we have observed with regards to a prediction model for the military based on enlisted men's internal(cumulative records) and external data(SNS data). This work is significant in the military's efforts to supervise them. In spite of their effort, many commanders have failed to prevent accidents by their subordinates. One of the important duties of officers' work is to take care of their subordinates in prevention unexpected accidents. However, it is hard to prevent accidents so we must attempt to determine a proper method. Our motivation for presenting this paper is to mate it possible to predict accidents using enlisted men's internal and external data. The biggest issue facing the military is the occurrence of accidents by enlisted men related to maladjustment and the relaxation of military discipline. The core method of preventing accidents by soldiers is to identify problems and manage them quickly. Commanders predict accidents by interviewing their soldiers and observing their surroundings. It requires considerable time and effort and results in a significant difference depending on the capabilities of the commanders. In this paper, we seek to predict accidents with objective data which can easily be obtained. Recently, records of enlisted men as well as SNS communication between commanders and soldiers, make it possible to predict and prevent accidents. This paper concerns the application of data mining to identify their interests, predict accidents and make use of internal and external data (SNS). We propose both a topic analysis and decision tree method. The study is conducted in two steps. First, topic analysis is conducted through the SNS of enlisted men. Second, the decision tree method is used to analyze the internal data with the results of the first analysis. The dependent variable for these analysis is the presence of any accidents. In order to analyze their SNS, we require tools such as text mining and topic analysis. We used SAS Enterprise Miner 12.1, which provides a text miner module. Our approach for finding their interests is composed of three main phases; collecting, topic analysis, and converting topic analysis results into points for using independent variables. In the first phase, we collect enlisted men's SNS data by commender's ID. After gathering unstructured SNS data, the topic analysis phase extracts issues from them. For simplicity, 5 topics(vacation, friends, stress, training, and sports) are extracted from 20,000 articles. In the third phase, using these 5 topics, we quantify them as personal points. After quantifying their topic, we include these results in independent variables which are composed of 15 internal data sets. Then, we make two decision trees. The first tree is composed of their internal data only. The second tree is composed of their external data(SNS) as well as their internal data. After that, we compare the results of misclassification from SAS E-miner. The first model's misclassification is 12.1%. On the other hand, second model's misclassification is 7.8%. This method predicts accidents with an accuracy of approximately 92%. The gap of the two models is 4.3%. Finally, we test if the difference between them is meaningful or not, using the McNemar test. The result of test is considered relevant.(p-value : 0.0003) This study has two limitations. First, the results of the experiments cannot be generalized, mainly because the experiment is limited to a small number of enlisted men's data. Additionally, various independent variables used in the decision tree model are used as categorical variables instead of continuous variables. So it suffers a loss of information. In spite of extensive efforts to provide prediction models for the military, commanders' predictions are accurate only when they have sufficient data about their subordinates. Our proposed methodology can provide support to decision-making in the military. This study is expected to contribute to the prevention of accidents in the military based on scientific analysis of enlisted men and proper management of them.

Shape-Based Subsequence Retrieval Supporting Multiple Models in Time-Series Databases (시계열 데이터베이스에서 복수의 모델을 지원하는 모양 기반 서브시퀀스 검색)

  • Won, Jung-Im;Yoon, Jee-Hee;Kim, Sang-Wook;Park, Sang-Hyun
    • The KIPS Transactions:PartD
    • /
    • v.10D no.4
    • /
    • pp.577-590
    • /
    • 2003
  • The shape-based retrieval is defined as the operation that searches for the (sub) sequences whose shapes are similar to that of a query sequence regardless of their actual element values. In this paper, we propose a similarity model suitable for shape-based retrieval and present an indexing method for supporting the similarity model. The proposed similarity model enables to retrieve similar shapes accurately by providing the combination of various shape-preserving transformations such as normalization, moving average, and time warping. Our indexing method stores every distinct subsequence concisely into the disk-based suffix tree for efficient and adaptive query processing. We allow the user to dynamically choose a similarity model suitable for a given application. More specifically, we allow the user to determine the parameter p of the distance function $L_p$ when submitting a query. The result of extensive experiments revealed that our approach not only successfully finds the subsequences whose shapes are similar to a query shape but also significantly outperforms the sequence search.

A Study on the Effect of Network Centralities on Recommendation Performance (네트워크 중심성 척도가 추천 성능에 미치는 영향에 대한 연구)

  • Lee, Dongwon
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.1
    • /
    • pp.23-46
    • /
    • 2021
  • Collaborative filtering, which is often used in personalization recommendations, is recognized as a very useful technique to find similar customers and recommend products to them based on their purchase history. However, the traditional collaborative filtering technique has raised the question of having difficulty calculating the similarity for new customers or products due to the method of calculating similaritiesbased on direct connections and common features among customers. For this reason, a hybrid technique was designed to use content-based filtering techniques together. On the one hand, efforts have been made to solve these problems by applying the structural characteristics of social networks. This applies a method of indirectly calculating similarities through their similar customers placed between them. This means creating a customer's network based on purchasing data and calculating the similarity between the two based on the features of the network that indirectly connects the two customers within this network. Such similarity can be used as a measure to predict whether the target customer accepts recommendations. The centrality metrics of networks can be utilized for the calculation of these similarities. Different centrality metrics have important implications in that they may have different effects on recommended performance. In this study, furthermore, the effect of these centrality metrics on the performance of recommendation may vary depending on recommender algorithms. In addition, recommendation techniques using network analysis can be expected to contribute to increasing recommendation performance even if they apply not only to new customers or products but also to entire customers or products. By considering a customer's purchase of an item as a link generated between the customer and the item on the network, the prediction of user acceptance of recommendation is solved as a prediction of whether a new link will be created between them. As the classification models fit the purpose of solving the binary problem of whether the link is engaged or not, decision tree, k-nearest neighbors (KNN), logistic regression, artificial neural network, and support vector machine (SVM) are selected in the research. The data for performance evaluation used order data collected from an online shopping mall over four years and two months. Among them, the previous three years and eight months constitute social networks composed of and the experiment was conducted by organizing the data collected into the social network. The next four months' records were used to train and evaluate recommender models. Experiments with the centrality metrics applied to each model show that the recommendation acceptance rates of the centrality metrics are different for each algorithm at a meaningful level. In this work, we analyzed only four commonly used centrality metrics: degree centrality, betweenness centrality, closeness centrality, and eigenvector centrality. Eigenvector centrality records the lowest performance in all models except support vector machines. Closeness centrality and betweenness centrality show similar performance across all models. Degree centrality ranking moderate across overall models while betweenness centrality always ranking higher than degree centrality. Finally, closeness centrality is characterized by distinct differences in performance according to the model. It ranks first in logistic regression, artificial neural network, and decision tree withnumerically high performance. However, it only records very low rankings in support vector machine and K-neighborhood with low-performance levels. As the experiment results reveal, in a classification model, network centrality metrics over a subnetwork that connects the two nodes can effectively predict the connectivity between two nodes in a social network. Furthermore, each metric has a different performance depending on the classification model type. This result implies that choosing appropriate metrics for each algorithm can lead to achieving higher recommendation performance. In general, betweenness centrality can guarantee a high level of performance in any model. It would be possible to consider the introduction of proximity centrality to obtain higher performance for certain models.

A study on automated soil moisture monitoring methods for the Korean peninsula based on Google Earth Engine (Google Earth Engine 기반의 한반도 토양수분 모니터링 자동화 기법 연구)

  • Jang, Wonjin;Chung, Jeehun;Lee, Yonggwan;Kim, Jinuk;Kim, Seongjoon
    • Journal of Korea Water Resources Association
    • /
    • v.57 no.9
    • /
    • pp.615-626
    • /
    • 2024
  • To accurately and efficiently monitor soil moisture (SM) across South Korea, this study developed a SM estimation model that integrates the cloud computing platform Google Earth Engine (GEE) and Automated Machine Learning (AutoML). Various spatial information was utilized based on Terra MODIS (Moderate Resolution Imaging Spectroradiometer) and the global precipitation observation satellite GPM (Global Precipitation Measurement) to test optimal input data combinations. The results indicated that GPM-based accumulated dry-days, 5-day antecedent average precipitation, NDVI (Normalized Difference Vegetation Index), the sum of LST (Land Surface Temperature) acquired during nighttime and daytime, soil properties (sand and clay content, bulk density), terrain data (elevation and slope), and seasonal classification had high feature importance. After setting the objective function (Determination of coefficient, R2 ; Root Mean Square Error, RMSE; Mean Absolute Percent Error, MAPE) using AutoML for the combination of the aforementioned data, a comparative evaluation of machine learning techniques was conducted. The results revealed that tree-based models exhibited high performance, with Random Forest demonstrating the best performance (R2 : 0.72, RMSE: 2.70 vol%, MAPE: 0.14).

Research on a Mobile-aware Service Model in the Internet of Things

  • An, Jian;Gui, Xiao-Lin;Yang, Jian-Wei;Zhang, Wen-Dong;Jiang, Jin-Hua
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.7 no.5
    • /
    • pp.1146-1165
    • /
    • 2013
  • Collaborative awareness between persons with various smart multimedia devices is a new trend in the Internet of Things (IoT). Because of the mobility, randomness, and complexity of persons, it is difficult to achieve complete data awareness and data transmission in IoT. Therefore, research must be conducted on mobile-aware service models. In this work, we first discuss and quantify the social relationships of mobile nodes from multiple perspectives based on a summary of social characteristics. We then define various decision factors (DFs). Next, we construct a directed and weighted community by analyzing the activity patterns of mobile nodes. Finally, a mobile-aware service routing algorithm (MSRA) is proposed to determine appropriate service nodes through a trusted chain and optimal path tree. The simulation results indicate that the model has superior dynamic adaptability and service discovery efficiency compared to the existing models. The mobile-aware service model could be used to improve date acquisition techniques and the quality of mobile-aware service in the IoT.

A Target Selection Model for the Counseling Services in Long-Term Care Insurance (노인장기요양보험 이용지원 상담 대상자 선정모형 개발)

  • Han, Eun-Jeong;Kim, Dong-Geon
    • The Korean Journal of Applied Statistics
    • /
    • v.28 no.6
    • /
    • pp.1063-1073
    • /
    • 2015
  • In the long-term care insurance (LTCI) system, National Health Insurance Service (NHIS) provide counseling services for beneficiaries and their family caregivers, which help them use LTC services appropriately. The purpose of this study was to develop a Target Selection Model for the Counseling Services based on needs of beneficiaries and their family caregivers. To develope models, we used data set of total 2,000 beneficiaries and family caregivers who have used the long-term care services in their home in March 2013 and completed questionnaires. The Target Selection Model was established through various data-mining models such as logistic regression, gradient boosting, Lasso, decision-tree model, Ensemble, and Neural network. Lasso model was selected as the final model because of the stability, high performance and availability. Our results might improve the satisfaction and the efficiency for the NHIS counseling services.

A Study on Propriety of Pilot Aptitude Test Using Phased Analysis of Pilot Training (비행교육과정 단계별 분석을 통한 조종적성검사 항목 타당성 연구)

  • Kim, HeeYoung;Kim, SuHwan;Moon, HoSeok
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.26 no.3
    • /
    • pp.218-225
    • /
    • 2016
  • It is important to select the personnel with ideal pilot aptitude considering dramatically advancing aircraft performance and complexity of military operations as a consequence to the highly developed science and technology. The opportunity cost lost from dropouts and human error being the first cause of aviation accidents are the realistic reasons for the significance of personnel selection based on their aptitude. This study analyses the ROKAF pilot aptitude test that was improved in 2004, using various classification models. This study discusses the significance of the selected variables along with the direction of ROKAF pilot aptitude test for its development in the future. The accuracy of the classification models was improved by taking into account differing personnel characteristics of individuals on the test.

A Study on the Korean Continuous Speech Recognition using Adaptive Pruning Algorithm and PDT-SSS Algorithm (적응 프루닝 알고리즘과 PDT-SSS 알고리즘을 이용한 한국어 연속음성인식에 관한 연구)

  • 황철준;오세진;김범국;정호열;정현열
    • Journal of Korea Multimedia Society
    • /
    • v.4 no.6
    • /
    • pp.524-533
    • /
    • 2001
  • Efficient continuous speech recognition system for practical applications requires that the processing be carried out in real time and high recognition accuracy. In this paper, we study the acoustic models by adopting the PDT-SSS algorithm and the language models by iterative learning so as to improve the speech recognition accuracy. And the adaptive pruning algorithm is applied to the continuous speech. To verify the effectiveness of proposed method, we carried out the continuous speech recognition for the Korean air flight reservation task. Experimental results show that the adopted algorithm has the average 90.9% for continuous speech recognition and the average 90.7% for word recognition accuracy including continuous speech. And in case of adopting the adaptive pruning algorithm to continuous speech, it reduces the recognition time of about 1.2 seconds(15%) without any loss of accuracy. From the result, we proved the effectiveness of the PDT-SSS algorithm and the adaptive pruning algorithm.

  • PDF