• Title/Summary/Keyword: Tree-Based Network

Search Result 631, Processing Time 0.028 seconds

Forecasting of Customer's Purchasing Intention Using Support Vector Machine (Support Vector Machine 기법을 이용한 고객의 구매의도 예측)

  • Kim, Jin-Hwa;Nam, Ki-Chan;Lee, Sang-Jong
    • Information Systems Review
    • /
    • v.10 no.2
    • /
    • pp.137-158
    • /
    • 2008
  • Rapid development of various information technologies creates new opportunities in online and offline markets. In this changing market environment, customers have various demands on new products and services. Therefore, their power and influence on the markets grow stronger each year. Companies have paid great attention to customer relationship management. Especially, personalized product recommendation systems, which recommend products and services based on customer's private information or purchasing behaviors in stores, is an important asset to most companies. CRM is one of the important business processes where reliable information is mined from customer database. Data mining techniques such as artificial intelligence are popular tools used to extract useful information and knowledge from these customer databases. In this research, we propose a recommendation system that predicts customer's purchase intention. Then, customer's purchasing intention of specific product is predicted by using data mining techniques using receipt data set. The performance of this suggested method is compared with that of other data mining technologies.

The Effects of the Biodiversity Increase after Creation of the Artificial Wetland -The Case of Ecological Pond at Seoul Technical High School- (인공습지 조성후 생물다양성 증진 효과에 관한 연구 -서울공고 생태연못을 중심으로-)

  • 김귀곤;조동길
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.27 no.3
    • /
    • pp.1-17
    • /
    • 1999
  • The purpose of this study is to evaluate the creation techniques of artificial wetland, one of biotopes developed to promote biodiversity in urban areas, and to look for improvement steps. Specifically, artificial wetland creation techniques were categorized into living environment and living creature classification. Being living conditions for creations, habitat environment was reviewed with a focus on water and soil environments. Living creatures were classified into plants, insects, fish, and birds. The evaluation of creation techniques was done in post-construction evaluation while considering the creation of habitats for living creatures. Intervention by users, changes in living environment and living species, and relevance of creation techniques were reviewed. Key results of this study are as follows. (1) Water environment for the living environment of creatures provides a suitable environment conditions for the living of creatures through a process easing the use of piped water. Various water depths and embankment appear to have a positive impact on the living of aquatic life. In particular, embankment covered in soil naturally played an important role as a place for the activities of aquatic insects and young fish as well as the growth of aquatic plants. (2) Various aquatic and ground plants to promote insect-diversity, shallow water, and old-tree logs had contributed greatly in increasing the types and number of insects. Aquatic insects. Aquatic insects were seen much particularly in areas where aquatic plants are rich but water is shallow than any other areas. (3) A space piled with stone to provide habitats for fish was not much used. However, it was observed that fish used embankment built with natural stones and embankment using logs in areas where water is deep. In addition, it was confirmed that 1,500 fish that had been released propagated using various depths and places for birth. (4) It was analyzed that techniques (creation of island, log setting, and creation of man-made bird nests) to provide habitats and to attract birds are not serving their roles. In such a case, it is believed that species had not increased due to the smallness as well as isolated features of the area. Based on theoretical review, they are judged to be areas that are likely to be used when a greater variety of birds is introduced. It is judged that attracting and keeping more birds at the site, such spaces need to be linked systematically in the future in terms of building eco-network while ensuring an adequate living areas. (5) In the study areas, users intervened greatly. As a result, a blockage was created preventing the normal growth of plants and non-indigenous plants were introduced. In order to limit the intervention by users, setting enough buffer zones, and environment education programs were urgently required. D/H=1>Hyangkyo> houses on the river>temples>lecture halls. D/H ratio of the backside areas is as follows. D/H=1>Hyangkyo>houses on the river>lecture halls. 4. Inner garden were planted deciduous than evergreen trees with Lagerstroemia indica. Enclosed dominant trees were planted by Pinus densiflora, Querces seuata. construct GEM strain, and examined for the expression and functional stability in microcosms.

  • PDF

Severity-Adjusted LOS Model of AMI patients based on the Korean National Hospital Discharge in-depth Injury Survey Data (퇴원손상심층조사 자료를 기반으로 한 급성심근경색환자 재원일수의 중증도 보정 모형 개발)

  • Kim, Won-Joong;Kim, Sung-Soo;Kim, Eun-Ju;Kang, Sung-Hong
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.14 no.10
    • /
    • pp.4910-4918
    • /
    • 2013
  • This study aims to design a Severity-Adjusted LOS(Length of Stay) Model in order to efficiently manage LOS of AMI(Acute Myocardial Infarction) patients. We designed a Severity-Adjusted LOS Model with using data-mining methods(multiple regression analysis, decision trees, and neural network) which covered 6,074 AMI patients who showed the diagnosis of I21 from 2004-2009 Korean National Hospital Discharge in-depth Injury Survey. A decision tree model was chosen for the final model that produced superior results. This study discovered that the execution of CABG, status at discharge(alive or dead), comorbidity index, etc. were major factors affecting a Sevirity-Adjustment of LOS of AMI patients. The difference between real LOS and adjusted LOS resulted from hospital location and bed size. The efficient management of LOS of AMI patients requires that we need to perform various activities after identifying differentiating factors. These factors can be specified by applying each hospital's data into this newly designed Severity-Adjusted LOS Model.

The Seamless Handoff Algorithm based on Multicast Group Mechanism among RNs in a PDSN Area (PDSN 영역내의 여러 RN간 멀티캐스트 그룹 메커니즘 기반의 Seamless 핸드오프 알고리즘)

  • Shin, Dong-Jin;Kim, Su-Chang;Lim, Sun-Bae;Oh, Jae-Chun;Song, Byeong-Kwon;Jeong, Tae-Eui
    • The KIPS Transactions:PartC
    • /
    • v.9C no.1
    • /
    • pp.97-106
    • /
    • 2002
  • In 3GPP2 standard, MIP is used and a PDSN performs the function of FA to support macro mobility. When a MS is roaming from a PDSN area to another, the mobility supported is called macro mobility, while it is called micro mobility when a MS is roaming from a RN area to another in a PDSN area. Since a PDSN performs the function of FA in 3GPP2 standard, it is possible to support mobility but its mechanism is actually for supporting macro mobility, not for micro mobility, thus it is weak in processing fast and seamless handoff to support micro mobility. In this paper, we suggest the seamless handoff algorithm barred on multicast group mechanism to support micro mobility. Depending on the moving direction and velocity of a MS, the suggested algorithm constructs a multicast group of RNs on the forecasted MS's moving path, and maximally delays RNs'joining to a multicast group to increase the network efficiency. Moreover, to resolve the buffer overhead problem of the existent multicast scheme, the algorithm suggests that each RN buffers data only after the forecasted handoff time. To prove deadlock freeness and liveness of the algorithm. we use state transition diagrams, a Petri-net modeling and its reachability tree. Then, we evaluate the performance by simulation.

Development of prediction model identifying high-risk older persons in need of long-term care (장기요양 필요 발생의 고위험 대상자 발굴을 위한 예측모형 개발)

  • Song, Mi Kyung;Park, Yeongwoo;Han, Eun-Jeong
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.4
    • /
    • pp.457-468
    • /
    • 2022
  • In aged society, it is important to prevent older people from being disability needing long-term care. The purpose of this study is to develop a prediction model to discover high-risk groups who are likely to be beneficiaries of Long-Term Care Insurance. This study is a retrospective study using database of National Health Insurance Service (NHIS) collected in the past of the study subjects. The study subjects are 7,724,101, the population over 65 years of age registered for medical insurance. To develop the prediction model, we used logistic regression, decision tree, random forest, and multi-layer perceptron neural network. Finally, random forest was selected as the prediction model based on the performances of models obtained through internal and external validation. Random forest could predict about 90% of the older people in need of long-term care using DB without any information from the assessment of eligibility for long-term care. The findings might be useful in evidencebased health management for prevention services and can contribute to preemptively discovering those who need preventive services in older people.

Prediction Model for unfavorable Outcome in Spontaneous Intracerebral Hemorrhage Based on Machine Learning

  • Shengli Li;Jianan Zhang;Xiaoqun Hou;Yongyi Wang;Tong Li;Zhiming Xu;Feng Chen;Yong Zhou;Weimin Wang;Mingxing Liu
    • Journal of Korean Neurosurgical Society
    • /
    • v.67 no.1
    • /
    • pp.94-102
    • /
    • 2024
  • Objective : The spontaneous intracerebral hemorrhage (ICH) remains a significant cause of mortality and morbidity throughout the world. The purpose of this retrospective study is to develop multiple models for predicting ICH outcomes using machine learning (ML). Methods : Between January 2014 and October 2021, we included ICH patients identified by computed tomography or magnetic resonance imaging and treated with surgery. At the 6-month check-up, outcomes were assessed using the modified Rankin Scale. In this study, four ML models, including Support Vector Machine (SVM), Decision Tree C5.0, Artificial Neural Network, Logistic Regression were used to build ICH prediction models. In order to evaluate the reliability and the ML models, we calculated the area under the receiver operating characteristic curve (AUC), specificity, sensitivity, accuracy, positive likelihood ratio (PLR), negative likelihood ratio (NLR), diagnostic odds ratio (DOR). Results : We identified 71 patients who had favorable outcomes and 156 who had unfavorable outcomes. The results showed that the SVM model achieved the best comprehensive prediction efficiency. For the SVM model, the AUC, accuracy, specificity, sensitivity, PLR, NLR, and DOR were 0.91, 0.92, 0.92, 0.93, 11.63, 0.076, and 153.03, respectively. For the SVM model, we found the importance value of time to operating room (TOR) was higher significantly than other variables. Conclusion : The analysis of clinical reliability showed that the SVM model achieved the best comprehensive prediction efficiency and the importance value of TOR was higher significantly than other variables.

A Study on the Revitalization of Tourism Industry through Big Data Analysis (한국관광 실태조사 빅 데이터 분석을 통한 관광산업 활성화 방안 연구)

  • Lee, Jungmi;Liu, Meina;Lim, Gyoo Gun
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.149-169
    • /
    • 2018
  • Korea is currently accumulating a large amount of data in public institutions based on the public data open policy and the "Government 3.0". Especially, a lot of data is accumulated in the tourism field. However, the academic discussions utilizing the tourism data are still limited. Moreover, the openness of the data of restaurants, hotels, and online tourism information, and how to use SNS Big Data in tourism are still limited. Therefore, utilization through tourism big data analysis is still low. In this paper, we tried to analyze influencing factors on foreign tourists' satisfaction in Korea through numerical data using data mining technique and R programming technique. In this study, we tried to find ways to revitalize the tourism industry by analyzing about 36,000 big data of the "Survey on the actual situation of foreign tourists from 2013 to 2015" surveyed by the Korea Culture & Tourism Research Institute. To do this, we analyzed the factors that have high influence on the 'Satisfaction', 'Revisit intention', and 'Recommendation' variables of foreign tourists. Furthermore, we analyzed the practical influences of the variables that are mentioned above. As a procedure of this study, we first integrated survey data of foreign tourists conducted by Korea Culture & Tourism Research Institute, which is stored in the tourist information system from 2013 to 2015, and eliminate unnecessary variables that are inconsistent with the research purpose among the integrated data. Some variables were modified to improve the accuracy of the analysis. And we analyzed the factors affecting the dependent variables by using data-mining methods: decision tree(C5.0, CART, CHAID, QUEST), artificial neural network, and logistic regression analysis of SPSS IBM Modeler 16.0. The seven variables that have the greatest effect on each dependent variable were derived. As a result of data analysis, it was found that seven major variables influencing 'overall satisfaction' were sightseeing spot attraction, food satisfaction, accommodation satisfaction, traffic satisfaction, guide service satisfaction, number of visiting places, and country. Variables that had a great influence appeared food satisfaction and sightseeing spot attraction. The seven variables that had the greatest influence on 'revisit intention' were the country, travel motivation, activity, food satisfaction, best activity, guide service satisfaction and sightseeing spot attraction. The most influential variables were food satisfaction and travel motivation for Korean style. Lastly, the seven variables that have the greatest influence on the 'recommendation intention' were the country, sightseeing spot attraction, number of visiting places, food satisfaction, activity, tour guide service satisfaction and cost. And then the variables that had the greatest influence were the country, sightseeing spot attraction, and food satisfaction. In addition, in order to grasp the influence of each independent variables more deeply, we used R programming to identify the influence of independent variables. As a result, it was found that the food satisfaction and sightseeing spot attraction were higher than other variables in overall satisfaction and had a greater effect than other influential variables. Revisit intention had a higher ${\beta}$ value in the travel motive as the purpose of Korean Wave than other variables. It will be necessary to have a policy that will lead to a substantial revisit of tourists by enhancing tourist attractions for the purpose of Korean Wave. Lastly, the recommendation had the same result of satisfaction as the sightseeing spot attraction and food satisfaction have higher ${\beta}$ value than other variables. From this analysis, we found that 'food satisfaction' and 'sightseeing spot attraction' variables were the common factors to influence three dependent variables that are mentioned above('Overall satisfaction', 'Revisit intention' and 'Recommendation'), and that those factors affected the satisfaction of travel in Korea significantly. The purpose of this study is to examine how to activate foreign tourists in Korea through big data analysis. It is expected to be used as basic data for analyzing tourism data and establishing effective tourism policy. It is expected to be used as a material to establish an activation plan that can contribute to tourism development in Korea in the future.

Improving Performance of Recommendation Systems Using Topic Modeling (사용자 관심 이슈 분석을 통한 추천시스템 성능 향상 방안)

  • Choi, Seongi;Hyun, Yoonjin;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.3
    • /
    • pp.101-116
    • /
    • 2015
  • Recently, due to the development of smart devices and social media, vast amounts of information with the various forms were accumulated. Particularly, considerable research efforts are being directed towards analyzing unstructured big data to resolve various social problems. Accordingly, focus of data-driven decision-making is being moved from structured data analysis to unstructured one. Also, in the field of recommendation system, which is the typical area of data-driven decision-making, the need of using unstructured data has been steadily increased to improve system performance. Approaches to improve the performance of recommendation systems can be found in two aspects- improving algorithms and acquiring useful data with high quality. Traditionally, most efforts to improve the performance of recommendation system were made by the former approach, while the latter approach has not attracted much attention relatively. In this sense, efforts to utilize unstructured data from variable sources are very timely and necessary. Particularly, as the interests of users are directly connected with their needs, identifying the interests of the user through unstructured big data analysis can be a crew for improving performance of recommendation systems. In this sense, this study proposes the methodology of improving recommendation system by measuring interests of the user. Specially, this study proposes the method to quantify interests of the user by analyzing user's internet usage patterns, and to predict user's repurchase based upon the discovered preferences. There are two important modules in this study. The first module predicts repurchase probability of each category through analyzing users' purchase history. We include the first module to our research scope for comparing the accuracy of traditional purchase-based prediction model to our new model presented in the second module. This procedure extracts purchase history of users. The core part of our methodology is in the second module. This module extracts users' interests by analyzing news articles the users have read. The second module constructs a correspondence matrix between topics and news articles by performing topic modeling on real world news articles. And then, the module analyzes users' news access patterns and then constructs a correspondence matrix between articles and users. After that, by merging the results of the previous processes in the second module, we can obtain a correspondence matrix between users and topics. This matrix describes users' interests in a structured manner. Finally, by using the matrix, the second module builds a model for predicting repurchase probability of each category. In this paper, we also provide experimental results of our performance evaluation. The outline of data used our experiments is as follows. We acquired web transaction data of 5,000 panels from a company that is specialized to analyzing ranks of internet sites. At first we extracted 15,000 URLs of news articles published from July 2012 to June 2013 from the original data and we crawled main contents of the news articles. After that we selected 2,615 users who have read at least one of the extracted news articles. Among the 2,615 users, we discovered that the number of target users who purchase at least one items from our target shopping mall 'G' is 359. In the experiments, we analyzed purchase history and news access records of the 359 internet users. From the performance evaluation, we found that our prediction model using both users' interests and purchase history outperforms a prediction model using only users' purchase history from a view point of misclassification ratio. In detail, our model outperformed the traditional one in appliance, beauty, computer, culture, digital, fashion, and sports categories when artificial neural network based models were used. Similarly, our model outperformed the traditional one in beauty, computer, digital, fashion, food, and furniture categories when decision tree based models were used although the improvement is very small.

Comparison of Association Rule Learning and Subgroup Discovery for Mining Traffic Accident Data (교통사고 데이터의 마이닝을 위한 연관규칙 학습기법과 서브그룹 발견기법의 비교)

  • Kim, Jeongmin;Ryu, Kwang Ryel
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.4
    • /
    • pp.1-16
    • /
    • 2015
  • Traffic accident is one of the major cause of death worldwide for the last several decades. According to the statistics of world health organization, approximately 1.24 million deaths occurred on the world's roads in 2010. In order to reduce future traffic accident, multipronged approaches have been adopted including traffic regulations, injury-reducing technologies, driving training program and so on. Records on traffic accidents are generated and maintained for this purpose. To make these records meaningful and effective, it is necessary to analyze relationship between traffic accident and related factors including vehicle design, road design, weather, driver behavior etc. Insight derived from these analysis can be used for accident prevention approaches. Traffic accident data mining is an activity to find useful knowledges about such relationship that is not well-known and user may interested in it. Many studies about mining accident data have been reported over the past two decades. Most of studies mainly focused on predict risk of accident using accident related factors. Supervised learning methods like decision tree, logistic regression, k-nearest neighbor, neural network are used for these prediction. However, derived prediction model from these algorithms are too complex to understand for human itself because the main purpose of these algorithms are prediction, not explanation of the data. Some of studies use unsupervised clustering algorithm to dividing the data into several groups, but derived group itself is still not easy to understand for human, so it is necessary to do some additional analytic works. Rule based learning methods are adequate when we want to derive comprehensive form of knowledge about the target domain. It derives a set of if-then rules that represent relationship between the target feature with other features. Rules are fairly easy for human to understand its meaning therefore it can help provide insight and comprehensible results for human. Association rule learning methods and subgroup discovery methods are representing rule based learning methods for descriptive task. These two algorithms have been used in a wide range of area from transaction analysis, accident data analysis, detection of statistically significant patient risk groups, discovering key person in social communities and so on. We use both the association rule learning method and the subgroup discovery method to discover useful patterns from a traffic accident dataset consisting of many features including profile of driver, location of accident, types of accident, information of vehicle, violation of regulation and so on. The association rule learning method, which is one of the unsupervised learning methods, searches for frequent item sets from the data and translates them into rules. In contrast, the subgroup discovery method is a kind of supervised learning method that discovers rules of user specified concepts satisfying certain degree of generality and unusualness. Depending on what aspect of the data we are focusing our attention to, we may combine different multiple relevant features of interest to make a synthetic target feature, and give it to the rule learning algorithms. After a set of rules is derived, some postprocessing steps are taken to make the ruleset more compact and easier to understand by removing some uninteresting or redundant rules. We conducted a set of experiments of mining our traffic accident data in both unsupervised mode and supervised mode for comparison of these rule based learning algorithms. Experiments with the traffic accident data reveals that the association rule learning, in its pure unsupervised mode, can discover some hidden relationship among the features. Under supervised learning setting with combinatorial target feature, however, the subgroup discovery method finds good rules much more easily than the association rule learning method that requires a lot of efforts to tune the parameters.

Strategy for Store Management Using SOM Based on RFM (RFM 기반 SOM을 이용한 매장관리 전략 도출)

  • Jeong, Yoon Jeong;Choi, Il Young;Kim, Jae Kyeong;Choi, Ju Choel
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.93-112
    • /
    • 2015
  • Depending on the change in consumer's consumption pattern, existing retail shop has evolved in hypermarket or convenience store offering grocery and daily products mostly. Therefore, it is important to maintain the inventory levels and proper product configuration for effectively utilize the limited space in the retail store and increasing sales. Accordingly, this study proposed proper product configuration and inventory level strategy based on RFM(Recency, Frequency, Monetary) model and SOM(self-organizing map) for manage the retail shop effectively. RFM model is analytic model to analyze customer behaviors based on the past customer's buying activities. And it can differentiates important customers from large data by three variables. R represents recency, which refers to the last purchase of commodities. The latest consuming customer has bigger R. F represents frequency, which refers to the number of transactions in a particular period and M represents monetary, which refers to consumption money amount in a particular period. Thus, RFM method has been known to be a very effective model for customer segmentation. In this study, using a normalized value of the RFM variables, SOM cluster analysis was performed. SOM is regarded as one of the most distinguished artificial neural network models in the unsupervised learning tool space. It is a popular tool for clustering and visualization of high dimensional data in such a way that similar items are grouped spatially close to one another. In particular, it has been successfully applied in various technical fields for finding patterns. In our research, the procedure tries to find sales patterns by analyzing product sales records with Recency, Frequency and Monetary values. And to suggest a business strategy, we conduct the decision tree based on SOM results. To validate the proposed procedure in this study, we adopted the M-mart data collected between 2014.01.01~2014.12.31. Each product get the value of R, F, M, and they are clustered by 9 using SOM. And we also performed three tests using the weekday data, weekend data, whole data in order to analyze the sales pattern change. In order to propose the strategy of each cluster, we examine the criteria of product clustering. The clusters through the SOM can be explained by the characteristics of these clusters of decision trees. As a result, we can suggest the inventory management strategy of each 9 clusters through the suggested procedures of the study. The highest of all three value(R, F, M) cluster's products need to have high level of the inventory as well as to be disposed in a place where it can be increasing customer's path. In contrast, the lowest of all three value(R, F, M) cluster's products need to have low level of inventory as well as to be disposed in a place where visibility is low. The highest R value cluster's products is usually new releases products, and need to be placed on the front of the store. And, manager should decrease inventory levels gradually in the highest F value cluster's products purchased in the past. Because, we assume that cluster has lower R value and the M value than the average value of good. And it can be deduced that product are sold poorly in recent days and total sales also will be lower than the frequency. The procedure presented in this study is expected to contribute to raising the profitability of the retail store. The paper is organized as follows. The second chapter briefly reviews the literature related to this study. The third chapter suggests procedures for research proposals, and the fourth chapter applied suggested procedure using the actual product sales data. Finally, the fifth chapter described the conclusion of the study and further research.