• Title/Summary/Keyword: tree

Search Result 13,612, Processing Time 0.041 seconds

Response Modeling for the Marketing Promotion with Weighted Case Based Reasoning Under Imbalanced Data Distribution (불균형 데이터 환경에서 변수가중치를 적용한 사례기반추론 기반의 고객반응 예측)

  • Kim, Eunmi;Hong, Taeho
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.1
    • /
    • pp.29-45
    • /
    • 2015
  • Response modeling is a well-known research issue for those who have tried to get more superior performance in the capability of predicting the customers' response for the marketing promotion. The response model for customers would reduce the marketing cost by identifying prospective customers from very large customer database and predicting the purchasing intention of the selected customers while the promotion which is derived from an undifferentiated marketing strategy results in unnecessary cost. In addition, the big data environment has accelerated developing the response model with data mining techniques such as CBR, neural networks and support vector machines. And CBR is one of the most major tools in business because it is known as simple and robust to apply to the response model. However, CBR is an attractive data mining technique for data mining applications in business even though it hasn't shown high performance compared to other machine learning techniques. Thus many studies have tried to improve CBR and utilized in business data mining with the enhanced algorithms or the support of other techniques such as genetic algorithm, decision tree and AHP (Analytic Process Hierarchy). Ahn and Kim(2008) utilized logit, neural networks, CBR to predict that which customers would purchase the items promoted by marketing department and tried to optimized the number of k for k-nearest neighbor with genetic algorithm for the purpose of improving the performance of the integrated model. Hong and Park(2009) noted that the integrated approach with CBR for logit, neural networks, and Support Vector Machine (SVM) showed more improved prediction ability for response of customers to marketing promotion than each data mining models such as logit, neural networks, and SVM. This paper presented an approach to predict customers' response of marketing promotion with Case Based Reasoning. The proposed model was developed by applying different weights to each feature. We deployed logit model with a database including the promotion and the purchasing data of bath soap. After that, the coefficients were used to give different weights of CBR. We analyzed the performance of proposed weighted CBR based model compared to neural networks and pure CBR based model empirically and found that the proposed weighted CBR based model showed more superior performance than pure CBR model. Imbalanced data is a common problem to build data mining model to classify a class with real data such as bankruptcy prediction, intrusion detection, fraud detection, churn management, and response modeling. Imbalanced data means that the number of instance in one class is remarkably small or large compared to the number of instance in other classes. The classification model such as response modeling has a lot of trouble to recognize the pattern from data through learning because the model tends to ignore a small number of classes while classifying a large number of classes correctly. To resolve the problem caused from imbalanced data distribution, sampling method is one of the most representative approach. The sampling method could be categorized to under sampling and over sampling. However, CBR is not sensitive to data distribution because it doesn't learn from data unlike machine learning algorithm. In this study, we investigated the robustness of our proposed model while changing the ratio of response customers and nonresponse customers to the promotion program because the response customers for the suggested promotion is always a small part of nonresponse customers in the real world. We simulated the proposed model 100 times to validate the robustness with different ratio of response customers to response customers under the imbalanced data distribution. Finally, we found that our proposed CBR based model showed superior performance than compared models under the imbalanced data sets. Our study is expected to improve the performance of response model for the promotion program with CBR under imbalanced data distribution in the real world.

Improving Performance of Recommendation Systems Using Topic Modeling (사용자 관심 이슈 분석을 통한 추천시스템 성능 향상 방안)

  • Choi, Seongi;Hyun, Yoonjin;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.3
    • /
    • pp.101-116
    • /
    • 2015
  • Recently, due to the development of smart devices and social media, vast amounts of information with the various forms were accumulated. Particularly, considerable research efforts are being directed towards analyzing unstructured big data to resolve various social problems. Accordingly, focus of data-driven decision-making is being moved from structured data analysis to unstructured one. Also, in the field of recommendation system, which is the typical area of data-driven decision-making, the need of using unstructured data has been steadily increased to improve system performance. Approaches to improve the performance of recommendation systems can be found in two aspects- improving algorithms and acquiring useful data with high quality. Traditionally, most efforts to improve the performance of recommendation system were made by the former approach, while the latter approach has not attracted much attention relatively. In this sense, efforts to utilize unstructured data from variable sources are very timely and necessary. Particularly, as the interests of users are directly connected with their needs, identifying the interests of the user through unstructured big data analysis can be a crew for improving performance of recommendation systems. In this sense, this study proposes the methodology of improving recommendation system by measuring interests of the user. Specially, this study proposes the method to quantify interests of the user by analyzing user's internet usage patterns, and to predict user's repurchase based upon the discovered preferences. There are two important modules in this study. The first module predicts repurchase probability of each category through analyzing users' purchase history. We include the first module to our research scope for comparing the accuracy of traditional purchase-based prediction model to our new model presented in the second module. This procedure extracts purchase history of users. The core part of our methodology is in the second module. This module extracts users' interests by analyzing news articles the users have read. The second module constructs a correspondence matrix between topics and news articles by performing topic modeling on real world news articles. And then, the module analyzes users' news access patterns and then constructs a correspondence matrix between articles and users. After that, by merging the results of the previous processes in the second module, we can obtain a correspondence matrix between users and topics. This matrix describes users' interests in a structured manner. Finally, by using the matrix, the second module builds a model for predicting repurchase probability of each category. In this paper, we also provide experimental results of our performance evaluation. The outline of data used our experiments is as follows. We acquired web transaction data of 5,000 panels from a company that is specialized to analyzing ranks of internet sites. At first we extracted 15,000 URLs of news articles published from July 2012 to June 2013 from the original data and we crawled main contents of the news articles. After that we selected 2,615 users who have read at least one of the extracted news articles. Among the 2,615 users, we discovered that the number of target users who purchase at least one items from our target shopping mall 'G' is 359. In the experiments, we analyzed purchase history and news access records of the 359 internet users. From the performance evaluation, we found that our prediction model using both users' interests and purchase history outperforms a prediction model using only users' purchase history from a view point of misclassification ratio. In detail, our model outperformed the traditional one in appliance, beauty, computer, culture, digital, fashion, and sports categories when artificial neural network based models were used. Similarly, our model outperformed the traditional one in beauty, computer, digital, fashion, food, and furniture categories when decision tree based models were used although the improvement is very small.

An Integrated Model based on Genetic Algorithms for Implementing Cost-Effective Intelligent Intrusion Detection Systems (비용효율적 지능형 침입탐지시스템 구현을 위한 유전자 알고리즘 기반 통합 모형)

  • Lee, Hyeon-Uk;Kim, Ji-Hun;Ahn, Hyun-Chul
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.1
    • /
    • pp.125-141
    • /
    • 2012
  • These days, the malicious attacks and hacks on the networked systems are dramatically increasing, and the patterns of them are changing rapidly. Consequently, it becomes more important to appropriately handle these malicious attacks and hacks, and there exist sufficient interests and demand in effective network security systems just like intrusion detection systems. Intrusion detection systems are the network security systems for detecting, identifying and responding to unauthorized or abnormal activities appropriately. Conventional intrusion detection systems have generally been designed using the experts' implicit knowledge on the network intrusions or the hackers' abnormal behaviors. However, they cannot handle new or unknown patterns of the network attacks, although they perform very well under the normal situation. As a result, recent studies on intrusion detection systems use artificial intelligence techniques, which can proactively respond to the unknown threats. For a long time, researchers have adopted and tested various kinds of artificial intelligence techniques such as artificial neural networks, decision trees, and support vector machines to detect intrusions on the network. However, most of them have just applied these techniques singularly, even though combining the techniques may lead to better detection. With this reason, we propose a new integrated model for intrusion detection. Our model is designed to combine prediction results of four different binary classification models-logistic regression (LOGIT), decision trees (DT), artificial neural networks (ANN), and support vector machines (SVM), which may be complementary to each other. As a tool for finding optimal combining weights, genetic algorithms (GA) are used. Our proposed model is designed to be built in two steps. At the first step, the optimal integration model whose prediction error (i.e. erroneous classification rate) is the least is generated. After that, in the second step, it explores the optimal classification threshold for determining intrusions, which minimizes the total misclassification cost. To calculate the total misclassification cost of intrusion detection system, we need to understand its asymmetric error cost scheme. Generally, there are two common forms of errors in intrusion detection. The first error type is the False-Positive Error (FPE). In the case of FPE, the wrong judgment on it may result in the unnecessary fixation. The second error type is the False-Negative Error (FNE) that mainly misjudges the malware of the program as normal. Compared to FPE, FNE is more fatal. Thus, total misclassification cost is more affected by FNE rather than FPE. To validate the practical applicability of our model, we applied it to the real-world dataset for network intrusion detection. The experimental dataset was collected from the IDS sensor of an official institution in Korea from January to June 2010. We collected 15,000 log data in total, and selected 10,000 samples from them by using random sampling method. Also, we compared the results from our model with the results from single techniques to confirm the superiority of the proposed model. LOGIT and DT was experimented using PASW Statistics v18.0, and ANN was experimented using Neuroshell R4.0. For SVM, LIBSVM v2.90-a freeware for training SVM classifier-was used. Empirical results showed that our proposed model based on GA outperformed all the other comparative models in detecting network intrusions from the accuracy perspective. They also showed that the proposed model outperformed all the other comparative models in the total misclassification cost perspective. Consequently, it is expected that our study may contribute to build cost-effective intelligent intrusion detection systems.

Product Recommender Systems using Multi-Model Ensemble Techniques (다중모형조합기법을 이용한 상품추천시스템)

  • Lee, Yeonjeong;Kim, Kyoung-Jae
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.2
    • /
    • pp.39-54
    • /
    • 2013
  • Recent explosive increase of electronic commerce provides many advantageous purchase opportunities to customers. In this situation, customers who do not have enough knowledge about their purchases, may accept product recommendations. Product recommender systems automatically reflect user's preference and provide recommendation list to the users. Thus, product recommender system in online shopping store has been known as one of the most popular tools for one-to-one marketing. However, recommender systems which do not properly reflect user's preference cause user's disappointment and waste of time. In this study, we propose a novel recommender system which uses data mining and multi-model ensemble techniques to enhance the recommendation performance through reflecting the precise user's preference. The research data is collected from the real-world online shopping store, which deals products from famous art galleries and museums in Korea. The data initially contain 5759 transaction data, but finally remain 3167 transaction data after deletion of null data. In this study, we transform the categorical variables into dummy variables and exclude outlier data. The proposed model consists of two steps. The first step predicts customers who have high likelihood to purchase products in the online shopping store. In this step, we first use logistic regression, decision trees, and artificial neural networks to predict customers who have high likelihood to purchase products in each product group. We perform above data mining techniques using SAS E-Miner software. In this study, we partition datasets into two sets as modeling and validation sets for the logistic regression and decision trees. We also partition datasets into three sets as training, test, and validation sets for the artificial neural network model. The validation dataset is equal for the all experiments. Then we composite the results of each predictor using the multi-model ensemble techniques such as bagging and bumping. Bagging is the abbreviation of "Bootstrap Aggregation" and it composite outputs from several machine learning techniques for raising the performance and stability of prediction or classification. This technique is special form of the averaging method. Bumping is the abbreviation of "Bootstrap Umbrella of Model Parameter," and it only considers the model which has the lowest error value. The results show that bumping outperforms bagging and the other predictors except for "Poster" product group. For the "Poster" product group, artificial neural network model performs better than the other models. In the second step, we use the market basket analysis to extract association rules for co-purchased products. We can extract thirty one association rules according to values of Lift, Support, and Confidence measure. We set the minimum transaction frequency to support associations as 5%, maximum number of items in an association as 4, and minimum confidence for rule generation as 10%. This study also excludes the extracted association rules below 1 of lift value. We finally get fifteen association rules by excluding duplicate rules. Among the fifteen association rules, eleven rules contain association between products in "Office Supplies" product group, one rules include the association between "Office Supplies" and "Fashion" product groups, and other three rules contain association between "Office Supplies" and "Home Decoration" product groups. Finally, the proposed product recommender systems provides list of recommendations to the proper customers. We test the usability of the proposed system by using prototype and real-world transaction and profile data. For this end, we construct the prototype system by using the ASP, Java Script and Microsoft Access. In addition, we survey about user satisfaction for the recommended product list from the proposed system and the randomly selected product lists. The participants for the survey are 173 persons who use MSN Messenger, Daum Caf$\acute{e}$, and P2P services. We evaluate the user satisfaction using five-scale Likert measure. This study also performs "Paired Sample T-test" for the results of the survey. The results show that the proposed model outperforms the random selection model with 1% statistical significance level. It means that the users satisfied the recommended product list significantly. The results also show that the proposed system may be useful in real-world online shopping store.

A Study on the Right Direction of Green Standard for Energy and Environmental Design(G-SEED) from the Perspective of Landscape Architecture (조경관점의 녹색건축 인증기준에 대한 방향 정립)

  • Cha, Uk Jin;Nam, Jung Chil;Yang, Geon Seok
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.44 no.4
    • /
    • pp.45-56
    • /
    • 2016
  • In this study, an analysis has been conducted on the evaluation criteria of current G-SEED(Green Standard for Energy and Environmental Design) and on the 78 buildings, certified by G-SEED, for 3 years from November, 2012 to November, 2015. Based on the results of this analysis, four issues are driven and proposed hereinafter. Issue 1 : Nowadays, the psychological proportion of landscape architecture in building is getting greater than ever so that it shows reliable reduction of carbon dioxide. Therefore, so far as the eight kinds of buildings are concerned, the evaluation items of G-SEED must include those of landscape architecture mandatorily through its enlargement. Issue 2 : It is undesirable factor that inhibits precise evaluation on landscaping area to let other areas appraise landscape architecture because it requires outstanding professionalism. So, G-SEED should not only ensure landscaping professionalism for the correct evaluation but also let landscape area participate in assessing other areas. Issue 3 : Many previous researches turned out that landscape planting technique has excellent effect on saving energy and reducing temperature of buildings. Thus, landscape planting technique of landscape area is required to be one of the evaluation items of energy sector. Issue 4 : Tree management also has to be newly included as one of the evaluation factor for the maintenance relating to the landscape architecture. G-SEED, enacted and enforced by the Green Building Creation Support Act in 2013, surely is effective system to reduce carbon dioxide in buildings. This is a special Act in its nature that is superior to Construction Law and must be observed by all means to construct buildings. Under the umbrella of this legal system, various of researches and products are contributing to creating new jobs in construction area. However, it is a well-known fact that landscape architecture area has shown less interest on this Act than that of construction area. In conclusion, it is necessary that landscape industry should conduct continuous researches on G-SEED and pay more attention to the Act enough to harvest related products and enlarge its work area.

(A Scalable Multipoint-to-Multipoint Routing Protocol in Ad-Hoc Networks) (애드-혹 네트워크에서의 확장성 있는 다중점 대 다중점 라우팅 프로토콜)

  • 강현정;이미정
    • Journal of KIISE:Information Networking
    • /
    • v.30 no.3
    • /
    • pp.329-342
    • /
    • 2003
  • Most of the existing multicast routing protocols for ad-hoc networks do not take into account the efficiency of the protocol for the cases when there are large number of sources in the multicast group, resulting in either large overhead or poor data delivery ratio when the number of sources is large. In this paper, we propose a multicast routing protocol for ad-hoc networks, which particularly considers the scalability of the protocol in terms of the number of sources in the multicast groups. The proposed protocol designates a set of sources as the core sources. Each core source is a root of each tree that reaches all the destinations of the multicast group. The union of these trees constitutes the data delivery mesh, and each of the non-core sources finds the nearest core source in order to delegate its data delivery. For the efficient operation of the proposed protocol, it is important to have an appropriate number of core sources. Having too many of the core sources incurs excessive control and data packet overhead, whereas having too little of them results in a vulnerable and overloaded data delivery mesh. The data delivery mesh is optimally reconfigured through the periodic control message flooding from the core sources, whereas the connectivity of the mesh is maintained by a persistent local mesh recovery mechanism. The simulation results show that the proposed protocol achieves an efficient multicast communication with high data delivery ratio and low communication overhead compared with the other existing multicast routing protocols when there are multiple sources in the multicast group.

A Study on Operation Strategy by Multi-variate Regression of Deagu Arboretum Visitor's Satisfaction (대구수목원 이용객 만족모델을 통한 운영 방안 연구)

  • Kang, Kee-Rae
    • Journal of Korean Society of Forest Science
    • /
    • v.101 no.1
    • /
    • pp.36-45
    • /
    • 2012
  • Education on the environment and plants offered by arboretum for today's people not only contribute to foster a better natural environment in urban region but also provide visitors with decent refreshment environment and beyond. In the study, the author undertook the observation on usage behavior and satisfaction model of arboretum visitors expect and investigated the facilities and programs to be offered by arboretum in order to propose the opinion regarding the service. For observation size of variables in a multiple regression analysis of variables is influencing satisfaction rankings walks the line of flow, the educational effect on the environment, cleanliness of the facility, visits pay, natural beauty, diversity of trees, accessibility and friendliness of staff, expansion of facilities in the arboretum and appeared as a complement. In case of visitor attribute, the residents living near the facility showed the highest visit frequency of more than 5 times, especially as part of taking a walk. This proves that the visit to arboretum is considered as part of everyday life, and thus a new program and walk path as well as movement route are needed to be developed for the visitors. In the question relating to the facilities and operation programs in Daegu Arboretum, particularly the requests by visitors, they responded that the establishment of cultural event, beautiful natural scenery, refreshment and convenience facilities is the most critical issue. In addition, the management on withered trees and bare lands is an urgent issue as well. In this sense, the Operation and Management Strategies based upon the visitor behaviors and model of satisfaction are needed to deal with the adoption of diverse events and festivals joined by local residents, ombudsman program, environmental program development for students and teachers within the region, negligent bare lands and withered tree replacement, and cafeteria facility improvement and supplement as well as the bench marking of other facilities than arboretums located in other regions. These items are thought to be sufficiently dealt with by Daegu Arboretum having no more external resources. It is recognized that the visitor satisfaction begins from a minor thing, and a small difference determines a great satisfaction, and thus the software approach rather than hardware one is in need.

Genetic Diversity and Phylogenetic Relationship in Korean Strains of Lentinus lepideus Based on PCR Polymorphism (PCR 다형성 분석에 의한 한국산 잣버섯의 유전적 다양성 및 유연관계)

  • Lee, Jae-Seong;Cho, Hae-Jin;Yoon, Ki-Nam;Alam, Nuhu;Lee, Kyung-Lim;Shim, Mi-Ja;Lee, Min-Woong;Lee, Yun-Hae;Jang, Myoung-Jun;Ju, Young-Chul;Cheong, Jong-Chun;Shin, Pyung-Gyun;Yoo, Young-Bok;Lee, U-Youn;Lee, Tae-Soo
    • The Korean Journal of Mycology
    • /
    • v.38 no.2
    • /
    • pp.105-111
    • /
    • 2010
  • Lentinus lepideus, known as train wrecker fungus, has been used for nutritional and medicinal purposes. Recently, commercial cultivation technique and a new cultivar of the mushroom were developed. To investigate the genetic diversity and phylogenetic relationship for identifying the mushroom strains and cultivar, one commercial and 13 strains of Lentinus lepideus from different geographical regions of Korea were analyzed by ITS regions of rDNA and RAPD of genomic DNA. Three strains of Lentinus edodes were also used for the analysis. The size of the ITS1 and ITS2 regions of rDNA from the different strains varied from 173 to 179 bp and 203 to 205 bp, respectively. The sequence of ITS1 was more variable than that of ITS2, while the 5.8S sequences were identical with 156 base pairs. A phylogenetic tree based on the ITS region sequences indicated that selected strains could be classified into four clusters, while 3 strains of L. edodes was divided into a new cluster. Ten primers out of 20 arbitrary primers used in the RAPD-PCR efficiently amplified the genomic DNA. The numbers of amplified DNA bands varied with the primers and strains, with polymorphic DNA fragments in the range from 0.2 to 2.6 kb. The results showed that phylogenetic relationship among Korean strains of Lentnus lepideus is high, but genetic diversity is low.

Comparative Molecular Phylogenetic Relationships in Different Strains of Pleurotus spp. (느타리속 버섯 계통의 분자생물학적 유연관계의 비교연구)

  • Cho, Hae-Jin;Lee, Jae-Seong;Yoon, Ki-Nam;Alam, Nuhu;Lee, Kyung-Lim;Shim, Mi-Ja;Lee, Min-Woong;Cheong, Jong-Chun;Shin, Pyung-Gyun;Yoo, Young-Bok;Lee, U-Youn;Lee, Tae-Soo
    • The Korean Journal of Mycology
    • /
    • v.38 no.2
    • /
    • pp.112-119
    • /
    • 2010
  • Pleurotus spp. have been used for edible and medicinal purposes in Asian countries for a long time. The fruiting bodies of the Pleurotus ostreatus, Pleurotus citrinopileatus and Pleurotus salmoneostramineus contained many physiologically beneficial substances for human health. Therefore, it is necessary to study the genetic diversity of Pleurotus mushroom cultivars commercially cultivated in Korea. Eleven strains of Pleurotus spp. were collected from different geographical regions in South-East Asia and ITS regions of rDNA and RAPD of genomic DNA were analyzed. The size of the ITS1 and ITS2 regions of rDNA from the different strains varied from 167 to 254 bp and 156 to 213 bp, respectively. The sequence of ITS1 was more variable than that of ITS2, and the 5.8S sequences were identical. A phylogenetic tree based on the ITS region sequences indicated that selected strains could be classified into 4 clusters. Eleven Pleurotus species were also analyzed by RAPD with 20 arbitrary primers. Ten of these primers were efficiently amplified the genomic DNA. The number of amplified bands varied with the primers and strains, with polymorphic fragments in the range from 0.1 to 2.0kb. The results revealed that genetic diversity of selected strains of P. ostreatus, P. citrinopileatus and P. salmoneostramineus is low.

Studies on the Woody Vegetation in the Edge of Natural River for Ecological Restoration in Korea (하천의 생태적 복원을 위한 자연하천변의 목본성 식물군락에 대한 연구)

  • Bang, Je-Yong;Hu, Un-Bok;Kim, Hyea-Ju;You, Young-Han
    • Journal of Wetlands Research
    • /
    • v.17 no.2
    • /
    • pp.124-129
    • /
    • 2015
  • In order to get as ecological basic data for river restoration, vegetation investigation was conducted in natural river and analysed it synecological methods, such as ordination cluster. 29 plant communities units were identified and the major dominant plant communites were Quercus mongolica community, Pinus densiflora community, Populus davidiana community, Q. variabilis community and Prunus sargentii community. River vegetations were classified into ravine and gorge forest type and riverine softwood forest type. Ravine and gorge forest was dominanted by hardwood which located in steep slope and in high elevation, and riverine softwood forest by softwood, salix spp. Naturality was an important criterion for the selection of rivers, so many of the selected rivers are located in the upper stream and mid stream rather than the lower stream, where more human intervention is involved. Plant communities were consisted of hardwood forest(44 plots, 92%) and softwood forest(4 plot, 8%), respectively. PCA with total layer data showed 5 groups of communities: Q. mongolica community group, Prunus sargentii community group, Pinus densiflora community group, Prunus sargentii community - Pinus densiflora community group and the rest communities group. PCA with tree layer showed 3 groups: Q. mongolica community group, Prunus sargentii community group, and the rest community group. Cluster analysis also a showed a similar communities group to PCA ordination, but Magnolia sieboldii community and Prunus sargentii community were distinguished from the PCA result. From the result, it can be concluded that the plant communities of riparian be divided into hardwood and softwood forest by statistical techniques. It was appropriate to plant species such as Quercus mongolica, Pinus densiflora, Populus davidiana, Quercus variabilis and Prunus sargentii, at levee zone and high water level. And Sliax spp. were appropriate for planted plants at waterfront and low water level. The herb species to be planted on the floodplain were recommanded in the species composition co-occurred with the woody species.