• Title/Summary/Keyword: Binary Systems

Search Result 1,167, Processing Time 0.027 seconds

A Study on the Relationship between Social Networks and Retirement Satisfaction of Old Retirees (고령은퇴자의 사회적 관계망과 은퇴만족도 관계 연구)

  • Chung, Soondool;Moon, Jinyoung;Kim, Sungwon
    • 한국노년학
    • /
    • v.30 no.4
    • /
    • pp.1145-1161
    • /
    • 2010
  • The objectives of this study were to examine how social networks of old retirees impact on their retirement satisfaction, and through this, to suggest ways of improving their retirement satisfaction. Data used in this study were from 2006 KLoSA(Korean Longitudinal Study of Ageing), which were collected from 1,009 elderly people aged 65 and over who resided metropolis and smaller medium cities and answered regarding their retirement satisfaction. Data were analyzed by Binary Logistic Regression method. As a result, the frequency of contact with children, the number of participation in their social activities, and the satisfaction of relationship with children were the significant variables to predict retirement satisfaction. In addition, other variables such as gender, subjective health status, type of retirement, and duration of past retirement have been found as significant variables to explain retirement satisfaction. Implications for designing effective retirement plan and service systems have been discussed.

Managing Duplicate Memberships of Websites : An Approach of Social Network Analysis (웹사이트 중복회원 관리 : 소셜 네트워크 분석 접근)

  • Kang, Eun-Young;Kwahk, Kee-Young
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.1
    • /
    • pp.153-169
    • /
    • 2011
  • Today using Internet environment is considered absolutely essential for establishing corporate marketing strategy. Companies have promoted their products and services through various ways of on-line marketing activities such as providing gifts and points to customers in exchange for participating in events, which is based on customers' membership data. Since companies can use these membership data to enhance their marketing efforts through various data analysis, appropriate website membership management may play an important role in increasing the effectiveness of on-line marketing campaign. Despite the growing interests in proper membership management, however, there have been difficulties in identifying inappropriate members who can weaken on-line marketing effectiveness. In on-line environment, customers tend to not reveal themselves clearly compared to off-line market. Customers who have malicious intent are able to create duplicate IDs by using others' names illegally or faking login information during joining membership. Since the duplicate members are likely to intercept gifts and points that should be sent to appropriate customers who deserve them, this can result in ineffective marketing efforts. Considering that the number of website members and its related marketing costs are significantly increasing, it is necessary for companies to find efficient ways to screen and exclude unfavorable troublemakers who are duplicate members. With this motivation, this study proposes an approach for managing duplicate membership based on the social network analysis and verifies its effectiveness using membership data gathered from real websites. A social network is a social structure made up of actors called nodes, which are tied by one or more specific types of interdependency. Social networks represent the relationship between the nodes and show the direction and strength of the relationship. Various analytical techniques have been proposed based on the social relationships, such as centrality analysis, structural holes analysis, structural equivalents analysis, and so on. Component analysis, one of the social network analysis techniques, deals with the sub-networks that form meaningful information in the group connection. We propose a method for managing duplicate memberships using component analysis. The procedure is as follows. First step is to identify membership attributes that will be used for analyzing relationship patterns among memberships. Membership attributes include ID, telephone number, address, posting time, IP address, and so on. Second step is to compose social matrices based on the identified membership attributes and aggregate the values of each social matrix into a combined social matrix. The combined social matrix represents how strong pairs of nodes are connected together. When a pair of nodes is strongly connected, we expect that those nodes are likely to be duplicate memberships. The combined social matrix is transformed into a binary matrix with '0' or '1' of cell values using a relationship criterion that determines whether the membership is duplicate or not. Third step is to conduct a component analysis for the combined social matrix in order to identify component nodes and isolated nodes. Fourth, identify the number of real memberships and calculate the reliability of website membership based on the component analysis results. The proposed procedure was applied to three real websites operated by a pharmaceutical company. The empirical results showed that the proposed method was superior to the traditional database approach using simple address comparison. In conclusion, this study is expected to shed some light on how social network analysis can enhance a reliable on-line marketing performance by efficiently and effectively identifying duplicate memberships of websites.

Corporate Bond Rating Using Various Multiclass Support Vector Machines (다양한 다분류 SVM을 적용한 기업채권평가)

  • Ahn, Hyun-Chul;Kim, Kyoung-Jae
    • Asia pacific journal of information systems
    • /
    • v.19 no.2
    • /
    • pp.157-178
    • /
    • 2009
  • Corporate credit rating is a very important factor in the market for corporate debt. Information concerning corporate operations is often disseminated to market participants through the changes in credit ratings that are published by professional rating agencies, such as Standard and Poor's (S&P) and Moody's Investor Service. Since these agencies generally require a large fee for the service, and the periodically provided ratings sometimes do not reflect the default risk of the company at the time, it may be advantageous for bond-market participants to be able to classify credit ratings before the agencies actually publish them. As a result, it is very important for companies (especially, financial companies) to develop a proper model of credit rating. From a technical perspective, the credit rating constitutes a typical, multiclass, classification problem because rating agencies generally have ten or more categories of ratings. For example, S&P's ratings range from AAA for the highest-quality bonds to D for the lowest-quality bonds. The professional rating agencies emphasize the importance of analysts' subjective judgments in the determination of credit ratings. However, in practice, a mathematical model that uses the financial variables of companies plays an important role in determining credit ratings, since it is convenient to apply and cost efficient. These financial variables include the ratios that represent a company's leverage status, liquidity status, and profitability status. Several statistical and artificial intelligence (AI) techniques have been applied as tools for predicting credit ratings. Among them, artificial neural networks are most prevalent in the area of finance because of their broad applicability to many business problems and their preeminent ability to adapt. However, artificial neural networks also have many defects, including the difficulty in determining the values of the control parameters and the number of processing elements in the layer as well as the risk of over-fitting. Of late, because of their robustness and high accuracy, support vector machines (SVMs) have become popular as a solution for problems with generating accurate prediction. An SVM's solution may be globally optimal because SVMs seek to minimize structural risk. On the other hand, artificial neural network models may tend to find locally optimal solutions because they seek to minimize empirical risk. In addition, no parameters need to be tuned in SVMs, barring the upper bound for non-separable cases in linear SVMs. Since SVMs were originally devised for binary classification, however they are not intrinsically geared for multiclass classifications as in credit ratings. Thus, researchers have tried to extend the original SVM to multiclass classification. Hitherto, a variety of techniques to extend standard SVMs to multiclass SVMs (MSVMs) has been proposed in the literature Only a few types of MSVM are, however, tested using prior studies that apply MSVMs to credit ratings studies. In this study, we examined six different techniques of MSVMs: (1) One-Against-One, (2) One-Against-AIL (3) DAGSVM, (4) ECOC, (5) Method of Weston and Watkins, and (6) Method of Crammer and Singer. In addition, we examined the prediction accuracy of some modified version of conventional MSVM techniques. To find the most appropriate technique of MSVMs for corporate bond rating, we applied all the techniques of MSVMs to a real-world case of credit rating in Korea. The best application is in corporate bond rating, which is the most frequently studied area of credit rating for specific debt issues or other financial obligations. For our study the research data were collected from National Information and Credit Evaluation, Inc., a major bond-rating company in Korea. The data set is comprised of the bond-ratings for the year 2002 and various financial variables for 1,295 companies from the manufacturing industry in Korea. We compared the results of these techniques with one another, and with those of traditional methods for credit ratings, such as multiple discriminant analysis (MDA), multinomial logistic regression (MLOGIT), and artificial neural networks (ANNs). As a result, we found that DAGSVM with an ordered list was the best approach for the prediction of bond rating. In addition, we found that the modified version of ECOC approach can yield higher prediction accuracy for the cases showing clear patterns.

Analysis of Image Processing Characteristics in Computed Radiography System by Virtual Digital Test Pattern Method (Virtual Digital Test Pattern Method를 이용한 CR 시스템의 영상처리 특성 분석)

  • Choi, In-Seok;Kim, Jung-Min;Oh, Hye-Kyong;Kim, You-Hyun;Lee, Ki-Sung;Jeong, Hoi-Woun;Choi, Seok-Yoon
    • Journal of radiological science and technology
    • /
    • v.33 no.2
    • /
    • pp.97-107
    • /
    • 2010
  • The objectives of this study is to figure out the unknown image processing methods of commercial CR system. We have implemented the processing curve of each Look up table(LUT) in REGIUS 150 CR system by using virtual digital test pattern method. The characteristic of Dry Imager was measured also. First of all, we have generated the virtual digital test pattern file with binary file editor. This file was used as an input data of CR system (REGIUS 150 CR system, KONICA MINOLTA). The DICOM files which were automatically generated output files by the CR system, were used to figure out the processing curves of each LUT modes (THX, ST, STM, LUM, BONE, LIN). The gradation curves of Dry Imager were also measured to figure out the characteristics of hard copy image. According to the results of each parameters, we identified the characteristics of image processing parameter in CR system. The processing curves which were measured by this proposed method showed the characteristics of CR system. And we found the linearity of Dry Imager in the middle area of processing curves. With these results, we found that the relationships between the curves and each parameters. The G value is related to the slope and the S value is related to the shift in x-axis of processing curves. In conclusion, the image processing method of the each commercial CR systems are different, and they are concealed. This proposed method which uses virtual digital test pattern can measure the characteristics of parameters for the image processing patterns in the CR system. We expect that the proposed method is useful to analogize the image processing means not only for this CR system, but also for the other commercial CR systems.

Customer Behavior Prediction of Binary Classification Model Using Unstructured Information and Convolution Neural Network: The Case of Online Storefront (비정형 정보와 CNN 기법을 활용한 이진 분류 모델의 고객 행태 예측: 전자상거래 사례를 중심으로)

  • Kim, Seungsoo;Kim, Jongwoo
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.221-241
    • /
    • 2018
  • Deep learning is getting attention recently. The deep learning technique which had been applied in competitions of the International Conference on Image Recognition Technology(ILSVR) and AlphaGo is Convolution Neural Network(CNN). CNN is characterized in that the input image is divided into small sections to recognize the partial features and combine them to recognize as a whole. Deep learning technologies are expected to bring a lot of changes in our lives, but until now, its applications have been limited to image recognition and natural language processing. The use of deep learning techniques for business problems is still an early research stage. If their performance is proved, they can be applied to traditional business problems such as future marketing response prediction, fraud transaction detection, bankruptcy prediction, and so on. So, it is a very meaningful experiment to diagnose the possibility of solving business problems using deep learning technologies based on the case of online shopping companies which have big data, are relatively easy to identify customer behavior and has high utilization values. Especially, in online shopping companies, the competition environment is rapidly changing and becoming more intense. Therefore, analysis of customer behavior for maximizing profit is becoming more and more important for online shopping companies. In this study, we propose 'CNN model of Heterogeneous Information Integration' using CNN as a way to improve the predictive power of customer behavior in online shopping enterprises. In order to propose a model that optimizes the performance, which is a model that learns from the convolution neural network of the multi-layer perceptron structure by combining structured and unstructured information, this model uses 'heterogeneous information integration', 'unstructured information vector conversion', 'multi-layer perceptron design', and evaluate the performance of each architecture, and confirm the proposed model based on the results. In addition, the target variables for predicting customer behavior are defined as six binary classification problems: re-purchaser, churn, frequent shopper, frequent refund shopper, high amount shopper, high discount shopper. In order to verify the usefulness of the proposed model, we conducted experiments using actual data of domestic specific online shopping company. This experiment uses actual transactions, customers, and VOC data of specific online shopping company in Korea. Data extraction criteria are defined for 47,947 customers who registered at least one VOC in January 2011 (1 month). The customer profiles of these customers, as well as a total of 19 months of trading data from September 2010 to March 2012, and VOCs posted for a month are used. The experiment of this study is divided into two stages. In the first step, we evaluate three architectures that affect the performance of the proposed model and select optimal parameters. We evaluate the performance with the proposed model. Experimental results show that the proposed model, which combines both structured and unstructured information, is superior compared to NBC(Naïve Bayes classification), SVM(Support vector machine), and ANN(Artificial neural network). Therefore, it is significant that the use of unstructured information contributes to predict customer behavior, and that CNN can be applied to solve business problems as well as image recognition and natural language processing problems. It can be confirmed through experiments that CNN is more effective in understanding and interpreting the meaning of context in text VOC data. And it is significant that the empirical research based on the actual data of the e-commerce company can extract very meaningful information from the VOC data written in the text format directly by the customer in the prediction of the customer behavior. Finally, through various experiments, it is possible to say that the proposed model provides useful information for the future research related to the parameter selection and its performance.

Ensemble Learning with Support Vector Machines for Bond Rating (회사채 신용등급 예측을 위한 SVM 앙상블학습)

  • Kim, Myoung-Jong
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.2
    • /
    • pp.29-45
    • /
    • 2012
  • Bond rating is regarded as an important event for measuring financial risk of companies and for determining the investment returns of investors. As a result, it has been a popular research topic for researchers to predict companies' credit ratings by applying statistical and machine learning techniques. The statistical techniques, including multiple regression, multiple discriminant analysis (MDA), logistic models (LOGIT), and probit analysis, have been traditionally used in bond rating. However, one major drawback is that it should be based on strict assumptions. Such strict assumptions include linearity, normality, independence among predictor variables and pre-existing functional forms relating the criterion variablesand the predictor variables. Those strict assumptions of traditional statistics have limited their application to the real world. Machine learning techniques also used in bond rating prediction models include decision trees (DT), neural networks (NN), and Support Vector Machine (SVM). Especially, SVM is recognized as a new and promising classification and regression analysis method. SVM learns a separating hyperplane that can maximize the margin between two categories. SVM is simple enough to be analyzed mathematical, and leads to high performance in practical applications. SVM implements the structuralrisk minimization principle and searches to minimize an upper bound of the generalization error. In addition, the solution of SVM may be a global optimum and thus, overfitting is unlikely to occur with SVM. In addition, SVM does not require too many data sample for training since it builds prediction models by only using some representative sample near the boundaries called support vectors. A number of experimental researches have indicated that SVM has been successfully applied in a variety of pattern recognition fields. However, there are three major drawbacks that can be potential causes for degrading SVM's performance. First, SVM is originally proposed for solving binary-class classification problems. Methods for combining SVMs for multi-class classification such as One-Against-One, One-Against-All have been proposed, but they do not improve the performance in multi-class classification problem as much as SVM for binary-class classification. Second, approximation algorithms (e.g. decomposition methods, sequential minimal optimization algorithm) could be used for effective multi-class computation to reduce computation time, but it could deteriorate classification performance. Third, the difficulty in multi-class prediction problems is in data imbalance problem that can occur when the number of instances in one class greatly outnumbers the number of instances in the other class. Such data sets often cause a default classifier to be built due to skewed boundary and thus the reduction in the classification accuracy of such a classifier. SVM ensemble learning is one of machine learning methods to cope with the above drawbacks. Ensemble learning is a method for improving the performance of classification and prediction algorithms. AdaBoost is one of the widely used ensemble learning techniques. It constructs a composite classifier by sequentially training classifiers while increasing weight on the misclassified observations through iterations. The observations that are incorrectly predicted by previous classifiers are chosen more often than examples that are correctly predicted. Thus Boosting attempts to produce new classifiers that are better able to predict examples for which the current ensemble's performance is poor. In this way, it can reinforce the training of the misclassified observations of the minority class. This paper proposes a multiclass Geometric Mean-based Boosting (MGM-Boost) to resolve multiclass prediction problem. Since MGM-Boost introduces the notion of geometric mean into AdaBoost, it can perform learning process considering the geometric mean-based accuracy and errors of multiclass. This study applies MGM-Boost to the real-world bond rating case for Korean companies to examine the feasibility of MGM-Boost. 10-fold cross validations for threetimes with different random seeds are performed in order to ensure that the comparison among three different classifiers does not happen by chance. For each of 10-fold cross validation, the entire data set is first partitioned into tenequal-sized sets, and then each set is in turn used as the test set while the classifier trains on the other nine sets. That is, cross-validated folds have been tested independently of each algorithm. Through these steps, we have obtained the results for classifiers on each of the 30 experiments. In the comparison of arithmetic mean-based prediction accuracy between individual classifiers, MGM-Boost (52.95%) shows higher prediction accuracy than both AdaBoost (51.69%) and SVM (49.47%). MGM-Boost (28.12%) also shows the higher prediction accuracy than AdaBoost (24.65%) and SVM (15.42%)in terms of geometric mean-based prediction accuracy. T-test is used to examine whether the performance of each classifiers for 30 folds is significantly different. The results indicate that performance of MGM-Boost is significantly different from AdaBoost and SVM classifiers at 1% level. These results mean that MGM-Boost can provide robust and stable solutions to multi-classproblems such as bond rating.

Effects of Customers' Relationship Networks on Organizational Performance: Focusing on Facebook Fan Page (고객 간 관계 네트워크가 조직성과에 미치는 영향: 페이스북 기업 팬페이지를 중심으로)

  • Jeon, Su-Hyeon;Kwahk, Kee-Young
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.2
    • /
    • pp.57-79
    • /
    • 2016
  • It is a rising trend that the number of users using one of the social media channels, the Social Network Service, so called the SNS, is getting increased. As per to this social trend, more companies have interest in this networking platform and start to invest their funds in it. It has received much attention as a tool spreading and expanding the message that a company wants to deliver to its customers and has been recognized as an important channel in terms of the relationship marketing with them. The environment of media that is radically changing these days makes possible for companies to approach their customers in various ways. Particularly, the social network service, which has been developed rapidly, provides the environment that customers can freely talk about products. For companies, it also works as a channel that gives customized information to customers. To succeed in the online environment, companies need to not only build the relationship between companies and customers but focus on the relationship between customers as well. In response to the online environment with the continuous development of technology, companies have tirelessly made the novel marketing strategy. Especially, as the one-to-one marketing to customers become available, it is more important for companies to maintain the relationship marketing with their customers. Among many SNS, Facebook, which many companies use as a communication channel, provides a fan page service for each company that supports its business. Facebook fan page is the platform that the event, information and announcement can be shared with customers using texts, videos, and pictures. Companies open their own fan pages in order to inform their companies and businesses. Such page functions as the websites of companies and has a characteristic of their brand communities such as blogs as well. As Facebook has become the major communication medium with customers, companies recognize its importance as the effective marketing channel, but they still need to investigate their business performances by using Facebook. Although there are infinite potentials in Facebook fan page that even has a function as a community between users, which other platforms do not, it is incomplete to regard companies' Facebook fan pages as communities and analyze them. In this study, it explores the relationship among customers through the network of the Facebook fan page users. The previous studies on a company's Facebook fan page were focused on finding out the effective operational direction by analyzing the use state of the company. However, in this study, it draws out the structural variable of the network, which customer committment can be measured by applying the social network analysis methodology and investigates the influence of the structural characteristics of network on the business performance of companies in an empirical way. Through each company's Facebook fan page, the network of users who engaged in the communication with each company is exploited and it is the one-mode undirected binary network that respectively regards users and the relationship of them in terms of their marketing activities as the node and link. In this network, it draws out the structural variable of network that can explain the customer commitment, who pressed "like," made comments and shared the Facebook marketing message, of each company by calculating density, global clustering coefficient, mean geodesic distance, diameter. By exploiting companies' historical performance such as net income and Tobin's Q indicator as the result variables, this study investigates influence on companies' business performances. For this purpose, it collects the network data on the subjects of 54 companies among KOSPI-listed companies, which have posted more than 100 articles on their Facebook fan pages during the data collection period. Then it draws out the network indicator of each company. The indicator related to companies' performances is calculated, based on the posted value on DART website of the Financial Supervisory Service. From the academic perspective, this study suggests a new approach through the social network analysis methodology to researchers who attempt to study the business-purpose utilization of the social media channel. From the practical perspective, this study proposes the more substantive marketing performance measurements to companies performing marketing activities through the social media and it is expected that it will bring a foundation of establishing smart business strategies by using the network indicators.

Steel Plate Faults Diagnosis with S-MTS (S-MTS를 이용한 강판의 표면 결함 진단)

  • Kim, Joon-Young;Cha, Jae-Min;Shin, Junguk;Yeom, Choongsub
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.1
    • /
    • pp.47-67
    • /
    • 2017
  • Steel plate faults is one of important factors to affect the quality and price of the steel plates. So far many steelmakers generally have used visual inspection method that could be based on an inspector's intuition or experience. Specifically, the inspector checks the steel plate faults by looking the surface of the steel plates. However, the accuracy of this method is critically low that it can cause errors above 30% in judgment. Therefore, accurate steel plate faults diagnosis system has been continuously required in the industry. In order to meet the needs, this study proposed a new steel plate faults diagnosis system using Simultaneous MTS (S-MTS), which is an advanced Mahalanobis Taguchi System (MTS) algorithm, to classify various surface defects of the steel plates. MTS has generally been used to solve binary classification problems in various fields, but MTS was not used for multiclass classification due to its low accuracy. The reason is that only one mahalanobis space is established in the MTS. In contrast, S-MTS is suitable for multi-class classification. That is, S-MTS establishes individual mahalanobis space for each class. 'Simultaneous' implies comparing mahalanobis distances at the same time. The proposed steel plate faults diagnosis system was developed in four main stages. In the first stage, after various reference groups and related variables are defined, data of the steel plate faults is collected and used to establish the individual mahalanobis space per the reference groups and construct the full measurement scale. In the second stage, the mahalanobis distances of test groups is calculated based on the established mahalanobis spaces of the reference groups. Then, appropriateness of the spaces is verified by examining the separability of the mahalanobis diatances. In the third stage, orthogonal arrays and Signal-to-Noise (SN) ratio of dynamic type are applied for variable optimization. Also, Overall SN ratio gain is derived from the SN ratio and SN ratio gain. If the derived overall SN ratio gain is negative, it means that the variable should be removed. However, the variable with the positive gain may be considered as worth keeping. Finally, in the fourth stage, the measurement scale that is composed of selected useful variables is reconstructed. Next, an experimental test should be implemented to verify the ability of multi-class classification and thus the accuracy of the classification is acquired. If the accuracy is acceptable, this diagnosis system can be used for future applications. Also, this study compared the accuracy of the proposed steel plate faults diagnosis system with that of other popular classification algorithms including Decision Tree, Multi Perception Neural Network (MLPNN), Logistic Regression (LR), Support Vector Machine (SVM), Tree Bagger Random Forest, Grid Search (GS), Genetic Algorithm (GA) and Particle Swarm Optimization (PSO). The steel plates faults dataset used in the study is taken from the University of California at Irvine (UCI) machine learning repository. As a result, the proposed steel plate faults diagnosis system based on S-MTS shows 90.79% of classification accuracy. The accuracy of the proposed diagnosis system is 6-27% higher than MLPNN, LR, GS, GA and PSO. Based on the fact that the accuracy of commercial systems is only about 75-80%, it means that the proposed system has enough classification performance to be applied in the industry. In addition, the proposed system can reduce the number of measurement sensors that are installed in the fields because of variable optimization process. These results show that the proposed system not only can have a good ability on the steel plate faults diagnosis but also reduce operation and maintenance cost. For our future work, it will be applied in the fields to validate actual effectiveness of the proposed system and plan to improve the accuracy based on the results.

A Study on the Effect of Network Centralities on Recommendation Performance (네트워크 중심성 척도가 추천 성능에 미치는 영향에 대한 연구)

  • Lee, Dongwon
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.1
    • /
    • pp.23-46
    • /
    • 2021
  • Collaborative filtering, which is often used in personalization recommendations, is recognized as a very useful technique to find similar customers and recommend products to them based on their purchase history. However, the traditional collaborative filtering technique has raised the question of having difficulty calculating the similarity for new customers or products due to the method of calculating similaritiesbased on direct connections and common features among customers. For this reason, a hybrid technique was designed to use content-based filtering techniques together. On the one hand, efforts have been made to solve these problems by applying the structural characteristics of social networks. This applies a method of indirectly calculating similarities through their similar customers placed between them. This means creating a customer's network based on purchasing data and calculating the similarity between the two based on the features of the network that indirectly connects the two customers within this network. Such similarity can be used as a measure to predict whether the target customer accepts recommendations. The centrality metrics of networks can be utilized for the calculation of these similarities. Different centrality metrics have important implications in that they may have different effects on recommended performance. In this study, furthermore, the effect of these centrality metrics on the performance of recommendation may vary depending on recommender algorithms. In addition, recommendation techniques using network analysis can be expected to contribute to increasing recommendation performance even if they apply not only to new customers or products but also to entire customers or products. By considering a customer's purchase of an item as a link generated between the customer and the item on the network, the prediction of user acceptance of recommendation is solved as a prediction of whether a new link will be created between them. As the classification models fit the purpose of solving the binary problem of whether the link is engaged or not, decision tree, k-nearest neighbors (KNN), logistic regression, artificial neural network, and support vector machine (SVM) are selected in the research. The data for performance evaluation used order data collected from an online shopping mall over four years and two months. Among them, the previous three years and eight months constitute social networks composed of and the experiment was conducted by organizing the data collected into the social network. The next four months' records were used to train and evaluate recommender models. Experiments with the centrality metrics applied to each model show that the recommendation acceptance rates of the centrality metrics are different for each algorithm at a meaningful level. In this work, we analyzed only four commonly used centrality metrics: degree centrality, betweenness centrality, closeness centrality, and eigenvector centrality. Eigenvector centrality records the lowest performance in all models except support vector machines. Closeness centrality and betweenness centrality show similar performance across all models. Degree centrality ranking moderate across overall models while betweenness centrality always ranking higher than degree centrality. Finally, closeness centrality is characterized by distinct differences in performance according to the model. It ranks first in logistic regression, artificial neural network, and decision tree withnumerically high performance. However, it only records very low rankings in support vector machine and K-neighborhood with low-performance levels. As the experiment results reveal, in a classification model, network centrality metrics over a subnetwork that connects the two nodes can effectively predict the connectivity between two nodes in a social network. Furthermore, each metric has a different performance depending on the classification model type. This result implies that choosing appropriate metrics for each algorithm can lead to achieving higher recommendation performance. In general, betweenness centrality can guarantee a high level of performance in any model. It would be possible to consider the introduction of proximity centrality to obtain higher performance for certain models.

Bankruptcy prediction using an improved bagging ensemble (개선된 배깅 앙상블을 활용한 기업부도예측)

  • Min, Sung-Hwan
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.4
    • /
    • pp.121-139
    • /
    • 2014
  • Predicting corporate failure has been an important topic in accounting and finance. The costs associated with bankruptcy are high, so the accuracy of bankruptcy prediction is greatly important for financial institutions. Lots of researchers have dealt with the topic associated with bankruptcy prediction in the past three decades. The current research attempts to use ensemble models for improving the performance of bankruptcy prediction. Ensemble classification is to combine individually trained classifiers in order to gain more accurate prediction than individual models. Ensemble techniques are shown to be very useful for improving the generalization ability of the classifier. Bagging is the most commonly used methods for constructing ensemble classifiers. In bagging, the different training data subsets are randomly drawn with replacement from the original training dataset. Base classifiers are trained on the different bootstrap samples. Instance selection is to select critical instances while deleting and removing irrelevant and harmful instances from the original set. Instance selection and bagging are quite well known in data mining. However, few studies have dealt with the integration of instance selection and bagging. This study proposes an improved bagging ensemble based on instance selection using genetic algorithms (GA) for improving the performance of SVM. GA is an efficient optimization procedure based on the theory of natural selection and evolution. GA uses the idea of survival of the fittest by progressively accepting better solutions to the problems. GA searches by maintaining a population of solutions from which better solutions are created rather than making incremental changes to a single solution to the problem. The initial solution population is generated randomly and evolves into the next generation by genetic operators such as selection, crossover and mutation. The solutions coded by strings are evaluated by the fitness function. The proposed model consists of two phases: GA based Instance Selection and Instance based Bagging. In the first phase, GA is used to select optimal instance subset that is used as input data of bagging model. In this study, the chromosome is encoded as a form of binary string for the instance subset. In this phase, the population size was set to 100 while maximum number of generations was set to 150. We set the crossover rate and mutation rate to 0.7 and 0.1 respectively. We used the prediction accuracy of model as the fitness function of GA. SVM model is trained on training data set using the selected instance subset. The prediction accuracy of SVM model over test data set is used as fitness value in order to avoid overfitting. In the second phase, we used the optimal instance subset selected in the first phase as input data of bagging model. We used SVM model as base classifier for bagging ensemble. The majority voting scheme was used as a combining method in this study. This study applies the proposed model to the bankruptcy prediction problem using a real data set from Korean companies. The research data used in this study contains 1832 externally non-audited firms which filed for bankruptcy (916 cases) and non-bankruptcy (916 cases). Financial ratios categorized as stability, profitability, growth, activity and cash flow were investigated through literature review and basic statistical methods and we selected 8 financial ratios as the final input variables. We separated the whole data into three subsets as training, test and validation data set. In this study, we compared the proposed model with several comparative models including the simple individual SVM model, the simple bagging model and the instance selection based SVM model. The McNemar tests were used to examine whether the proposed model significantly outperforms the other models. The experimental results show that the proposed model outperforms the other models.