• Title/Summary/Keyword: Binary classification

Search Result 464, Processing Time 0.019 seconds

The Prediction of Purchase Amount of Customers Using Support Vector Regression with Separated Learning Method (Support Vector Regression에서 분리학습을 이용한 고객의 구매액 예측모형)

  • Hong, Tae-Ho;Kim, Eun-Mi
    • Journal of Intelligence and Information Systems
    • /
    • v.16 no.4
    • /
    • pp.213-225
    • /
    • 2010
  • Data mining has empowered the managers who are charge of the tasks in their company to present personalized and differentiated marketing programs to their customers with the rapid growth of information technology. Most studies on customer' response have focused on predicting whether they would respond or not for their marketing promotion as marketing managers have been eager to identify who would respond to their marketing promotion. So many studies utilizing data mining have tried to resolve the binary decision problems such as bankruptcy prediction, network intrusion detection, and fraud detection in credit card usages. The prediction of customer's response has been studied with similar methods mentioned above because the prediction of customer's response is a kind of dichotomous decision problem. In addition, a number of competitive data mining techniques such as neural networks, SVM(support vector machine), decision trees, logit, and genetic algorithms have been applied to the prediction of customer's response for marketing promotion. The marketing managers also have tried to classify their customers with quantitative measures such as recency, frequency, and monetary acquired from their transaction database. The measures mean that their customers came to purchase in recent or old days, how frequent in a period, and how much they spent once. Using segmented customers we proposed an approach that could enable to differentiate customers in the same rating among the segmented customers. Our approach employed support vector regression to forecast the purchase amount of customers for each customer rating. Our study used the sample that included 41,924 customers extracted from DMEF04 Data Set, who purchased at least once in the last two years. We classified customers from first rating to fifth rating based on the purchase amount after giving a marketing promotion. Here, we divided customers into first rating who has a large amount of purchase and fifth rating who are non-respondents for the promotion. Our proposed model forecasted the purchase amount of the customers in the same rating and the marketing managers could make a differentiated and personalized marketing program for each customer even though they were belong to the same rating. In addition, we proposed more efficient learning method by separating the learning samples. We employed two learning methods to compare the performance of proposed learning method with general learning method for SVRs. LMW (Learning Method using Whole data for purchasing customers) is a general learning method for forecasting the purchase amount of customers. And we proposed a method, LMS (Learning Method using Separated data for classification purchasing customers), that makes four different SVR models for each class of customers. To evaluate the performance of models, we calculated MAE (Mean Absolute Error) and MAPE (Mean Absolute Percent Error) for each model to predict the purchase amount of customers. In LMW, the overall performance was 0.670 MAPE and the best performance showed 0.327 MAPE. Generally, the performances of the proposed LMS model were analyzed as more superior compared to the performance of the LMW model. In LMS, we found that the best performance was 0.275 MAPE. The performance of LMS was higher than LMW in each class of customers. After comparing the performance of our proposed method LMS to LMW, our proposed model had more significant performance for forecasting the purchase amount of customers in each class. In addition, our approach will be useful for marketing managers when they need to customers for their promotion. Even if customers were belonging to same class, marketing managers could offer customers a differentiated and personalized marketing promotion.

Optimal Selection of Classifier Ensemble Using Genetic Algorithms (유전자 알고리즘을 이용한 분류자 앙상블의 최적 선택)

  • Kim, Myung-Jong
    • Journal of Intelligence and Information Systems
    • /
    • v.16 no.4
    • /
    • pp.99-112
    • /
    • 2010
  • Ensemble learning is a method for improving the performance of classification and prediction algorithms. It is a method for finding a highly accurateclassifier on the training set by constructing and combining an ensemble of weak classifiers, each of which needs only to be moderately accurate on the training set. Ensemble learning has received considerable attention from machine learning and artificial intelligence fields because of its remarkable performance improvement and flexible integration with the traditional learning algorithms such as decision tree (DT), neural networks (NN), and SVM, etc. In those researches, all of DT ensemble studies have demonstrated impressive improvements in the generalization behavior of DT, while NN and SVM ensemble studies have not shown remarkable performance as shown in DT ensembles. Recently, several works have reported that the performance of ensemble can be degraded where multiple classifiers of an ensemble are highly correlated with, and thereby result in multicollinearity problem, which leads to performance degradation of the ensemble. They have also proposed the differentiated learning strategies to cope with performance degradation problem. Hansen and Salamon (1990) insisted that it is necessary and sufficient for the performance enhancement of an ensemble that the ensemble should contain diverse classifiers. Breiman (1996) explored that ensemble learning can increase the performance of unstable learning algorithms, but does not show remarkable performance improvement on stable learning algorithms. Unstable learning algorithms such as decision tree learners are sensitive to the change of the training data, and thus small changes in the training data can yield large changes in the generated classifiers. Therefore, ensemble with unstable learning algorithms can guarantee some diversity among the classifiers. To the contrary, stable learning algorithms such as NN and SVM generate similar classifiers in spite of small changes of the training data, and thus the correlation among the resulting classifiers is very high. This high correlation results in multicollinearity problem, which leads to performance degradation of the ensemble. Kim,s work (2009) showedthe performance comparison in bankruptcy prediction on Korea firms using tradition prediction algorithms such as NN, DT, and SVM. It reports that stable learning algorithms such as NN and SVM have higher predictability than the unstable DT. Meanwhile, with respect to their ensemble learning, DT ensemble shows the more improved performance than NN and SVM ensemble. Further analysis with variance inflation factor (VIF) analysis empirically proves that performance degradation of ensemble is due to multicollinearity problem. It also proposes that optimization of ensemble is needed to cope with such a problem. This paper proposes a hybrid system for coverage optimization of NN ensemble (CO-NN) in order to improve the performance of NN ensemble. Coverage optimization is a technique of choosing a sub-ensemble from an original ensemble to guarantee the diversity of classifiers in coverage optimization process. CO-NN uses GA which has been widely used for various optimization problems to deal with the coverage optimization problem. The GA chromosomes for the coverage optimization are encoded into binary strings, each bit of which indicates individual classifier. The fitness function is defined as maximization of error reduction and a constraint of variance inflation factor (VIF), which is one of the generally used methods to measure multicollinearity, is added to insure the diversity of classifiers by removing high correlation among the classifiers. We use Microsoft Excel and the GAs software package called Evolver. Experiments on company failure prediction have shown that CO-NN is effectively applied in the stable performance enhancement of NNensembles through the choice of classifiers by considering the correlations of the ensemble. The classifiers which have the potential multicollinearity problem are removed by the coverage optimization process of CO-NN and thereby CO-NN has shown higher performance than a single NN classifier and NN ensemble at 1% significance level, and DT ensemble at 5% significance level. However, there remain further research issues. First, decision optimization process to find optimal combination function should be considered in further research. Secondly, various learning strategies to deal with data noise should be introduced in more advanced further researches in the future.

Performance Evaluation of Monitoring System for Sargassum horneri Using GOCI-II: Focusing on the Results of Removing False Detection in the Yellow Sea and East China Sea (GOCI-II 기반 괭생이모자반 모니터링 시스템 성능 평가: 황해 및 동중국해 해역 오탐지 제거 결과를 중심으로)

  • Han-bit Lee;Ju-Eun Kim;Moon-Seon Kim;Dong-Su Kim;Seung-Hwan Min;Tae-Ho Kim
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.6_2
    • /
    • pp.1615-1633
    • /
    • 2023
  • Sargassum horneri is one of the floating algae in the sea, which breeds in large quantities in the Yellow Sea and East China Sea and then flows into the coast of Republic of Korea, causing various problems such as destroying the environment and damaging fish farms. In order to effectively prevent damage and preserve the coastal environment, the development of Sargassum horneri detection algorithms using satellite-based remote sensing technology has been actively developed. However, incorrect detection information causes an increase in the moving distance of ships collecting Sargassum horneri and confusion in the response of related local governments or institutions,so it is very important to minimize false detections when producing Sargassum horneri spatial information. This study applied technology to automatically remove false detection results using the GOCI-II-based Sargassum horneri detection algorithm of the National Ocean Satellite Center (NOSC) of the Korea Hydrographic and Oceanography Agency (KHOA). Based on the results of analyzing the causes of major false detection results, it includes a process of removing linear and sporadic false detections and green algae that occurs in large quantities along the coast of China in spring and summer by considering them as false detections. The technology to automatically remove false detection was applied to the dates when Sargassum horneri occurred from February 24 to June 25, 2022. Visual assessment results were generated using mid-resolution satellite images, qualitative and quantitative evaluations were performed. Linear false detection results were completely removed, and most of the sporadic and green algae false detection results that affected the distribution were removed. Even after the automatic false detection removal process, it was possible to confirm the distribution area of Sargassum horneri compared to the visual assessment results, and the accuracy and precision calculated using the binary classification model averaged 97.73% and 95.4%, respectively. Recall value was very low at 29.03%, which is presumed to be due to the effect of Sargassum horneri movement due to the observation time discrepancy between GOCI-II and mid-resolution satellite images, differences in spatial resolution, location deviation by orthocorrection, and cloud masking. The results of this study's removal of false detections of Sargassum horneri can determine the spatial distribution status in near real-time, but there are limitations in accurately estimating biomass. Therefore, continuous research on upgrading the Sargassum horneri monitoring system must be conducted to use it as data for establishing future Sargassum horneri response plans.

Factors Affecting International Transfer Pricing of Multinational Enterprises in Korea (외국인투자기업의 국제이전가격 결정에 영향을 미치는 환경 및 기업요인)

  • Jun, Tae-Young;Byun, Yong-Hwan
    • Korean small business review
    • /
    • v.31 no.2
    • /
    • pp.85-102
    • /
    • 2009
  • With the continued globalization of world markets, transfer pricing has become one of the dominant sources of controversy in international taxation. Transfer pricing is the process by which a multinational corporation calculates a price for goods and services that are transferred to affiliated entities. Consider a Korean electronic enterprise that buys supplies from its own subsidiary located in China. How much the Korean parent company pays its subsidiary will determine how much profit the Chinese unit reports in local taxes. If the parent company pays above normal market prices, it may appear to have a poor profit, even if the group as a whole shows a respectable profit margin. In this way, transfer prices impact the taxable income reported in each country in which the multinational enterprise operates. It's importance lies in that around 60% of international trade involves transactions between two related parts of multinationals, according to the OECD. Multinational enterprises (hereafter MEs) exert much effort into utilizing organizational advantages to make global investments. MEs wish to minimize their tax burden. So MEs spend a fortune on economists and accountants to justify transfer prices that suit their tax needs. On the contrary, local governments are not prepared to cope with MEs' powerful financial instruments. Tax authorities in each country wish to ensure that the tax base of any ME is divided fairly. Thus, both tax authorities and MEs have a vested interest in the way in which a transfer price is determined, and this is why MEs' international transfer prices are at the center of disputes concerned with taxation. Transfer pricing issues and practices are sometimes difficult to control for regulators because the tax administration does not have enough staffs with the knowledge and resources necessary to understand them. The authors examine transfer pricing practices to provide relevant resources useful in designing tax incentives and regulation schemes for policy makers. This study focuses on identifying the relevant business and environmental factors that could influence the international transfer pricing of MEs. In this perspective, we empirically investigate how the management perception of related variables influences their choice of international transfer pricing methods. We believe that this research is particularly useful in the design of tax policy. Because it can concentrate on a few selected factors in consideration of the limited budget of the tax administration with assistance of this research. Data is composed of questionnaire responses from foreign firms in Korea with investment balances exceeding one million dollars in the end of 2004. We mailed questionnaires to 861 managers in charge of the accounting departments of each company, resulting in 121 valid responses. Seventy six percent of the sample firms are classified as small and medium sized enterprises with assets below 100 billion Korean won. Reviewing transfer pricing methods, cost-based transfer pricing is most popular showing that 60 firms have adopted it. The market-based method is used by 31 firms, and 13 firms have reported the resale-pricing method. Regarding the nationalities of foreign investors, the Japanese and the Americans constitute most of the sample. Logistic regressions have been performed for statistical analysis. The dependent variable is binary in that whether the method of international transfer pricing is a market-based method or a cost-based method. This type of binary classification is founded on the belief that the market-based method is evaluated as the relatively objective way of pricing compared with the cost-based methods. Cost-based pricing is assumed to give mangers flexibility in transfer pricing decisions. Therefore, local regulatory agencies are thought to prefer market-based pricing over cost-based pricing. Independent variables are composed of eight factors such as corporate tax rate, tariffs, relations with local tax authorities, tax audit, equity ratios of local investors, volume of internal trade, sales volume, and product life cycle. The first four variables are included in the model because taxation lies in the center of transfer pricing disputes. So identifying the impact of these variables in Korean business environments is much needed. Equity ratio is included to represent the interest of local partners. Volume of internal trade was sometimes employed in previous research to check the pricing behavior of managers, so we have followed these footsteps in this paper. Product life cycle is used as a surrogate of competition in local markets. Control variables are firm size and nationality of foreign investors. Firm size is controlled using dummy variables in that whether or not the specific firm is small and medium sized. This is because some researchers report that big firms show different behaviors compared with small and medium sized firms in transfer pricing. The other control variable is also expressed in dummy variable showing if the entrepreneur is the American or not. That's because some prior studies conclude that the American management style is different in that they limit branch manger's freedom of decision. Reviewing the statistical results, we have found that managers prefer the cost-based method over the market-based method as the importance of corporate taxes and tariffs increase. This result means that managers need flexibility to lessen the tax burden when they feel taxes are important. They also prefer the cost-based method as the product life cycle matures, which means that they support subsidiaries in local market competition using cost-based transfer pricing. On the contrary, as the relationship with local tax authorities becomes more important, managers prefer the market-based method. That is because market-based pricing is a better way to maintain good relations with the tax officials. Other variables like tax audit, volume of internal transactions, sales volume, and local equity ratio have shown only insignificant influence. Additionally, we have replaced two tax variables(corporate taxes and tariffs) with the data showing top marginal tax rate and mean tariff rates of each country, and have performed another regression to find if we could get different results compared with the former one. As a consequence, we have found something different on the part of mean tariffs, that shows only an insignificant influence on the dependent variable. We guess that each company in the sample pays tariffs with a specific rate applied only for one's own company, which could be located far from mean tariff rates. Therefore we have concluded we need a more detailed data that shows the tariffs of each company if we want to check the role of this variable. Considering that the present paper has heavily relied on questionnaires, an effort to build a reliable data base is needed for enhancing the research reliability.