Search | Korea Science

The Prediction of DEA based Efficiency Rating for Venture Business Using Multi-class SVM (다분류 SVM을 이용한 DEA기반 벤처기업 효율성등급 예측모형)

Park, Ji-Young;Hong, Tae-Ho
- Asia pacific journal of information systems
- /
- v.19 no.2
- /
- pp.139-155
- /
- 2009
For the last few decades, many studies have tried to explore and unveil venture companies' success factors and unique features in order to identify the sources of such companies' competitive advantages over their rivals. Such venture companies have shown tendency to give high returns for investors generally making the best use of information technology. For this reason, many venture companies are keen on attracting avid investors' attention. Investors generally make their investment decisions by carefully examining the evaluation criteria of the alternatives. To them, credit rating information provided by international rating agencies, such as Standard and Poor's, Moody's and Fitch is crucial source as to such pivotal concerns as companies stability, growth, and risk status. But these types of information are generated only for the companies issuing corporate bonds, not venture companies. Therefore, this study proposes a method for evaluating venture businesses by presenting our recent empirical results using financial data of Korean venture companies listed on KOSDAQ in Korea exchange. In addition, this paper used multi-class SVM for the prediction of DEA-based efficiency rating for venture businesses, which was derived from our proposed method. Our approach sheds light on ways to locate efficient companies generating high level of profits. Above all, in determining effective ways to evaluate a venture firm's efficiency, it is important to understand the major contributing factors of such efficiency. Therefore, this paper is constructed on the basis of following two ideas to classify which companies are more efficient venture companies: i) making DEA based multi-class rating for sample companies and ii) developing multi-class SVM-based efficiency prediction model for classifying all companies. First, the Data Envelopment Analysis(DEA) is a non-parametric multiple input-output efficiency technique that measures the relative efficiency of decision making units(DMUs) using a linear programming based model. It is non-parametric because it requires no assumption on the shape or parameters of the underlying production function. DEA has been already widely applied for evaluating the relative efficiency of DMUs. Recently, a number of DEA based studies have evaluated the efficiency of various types of companies, such as internet companies and venture companies. It has been also applied to corporate credit ratings. In this study we utilized DEA for sorting venture companies by efficiency based ratings. The Support Vector Machine(SVM), on the other hand, is a popular technique for solving data classification problems. In this paper, we employed SVM to classify the efficiency ratings in IT venture companies according to the results of DEA. The SVM method was first developed by Vapnik (1995). As one of many machine learning techniques, SVM is based on a statistical theory. Thus far, the method has shown good performances especially in generalizing capacity in classification tasks, resulting in numerous applications in many areas of business, SVM is basically the algorithm that finds the maximum margin hyperplane, which is the maximum separation between classes. According to this method, support vectors are the closest to the maximum margin hyperplane. If it is impossible to classify, we can use the kernel function. In the case of nonlinear class boundaries, we can transform the inputs into a high-dimensional feature space, This is the original input space and is mapped into a high-dimensional dot-product space. Many studies applied SVM to the prediction of bankruptcy, the forecast a financial time series, and the problem of estimating credit rating, In this study we employed SVM for developing data mining-based efficiency prediction model. We used the Gaussian radial function as a kernel function of SVM. In multi-class SVM, we adopted one-against-one approach between binary classification method and two all-together methods, proposed by Weston and Watkins(1999) and Crammer and Singer(2000), respectively. In this research, we used corporate information of 154 companies listed on KOSDAQ market in Korea exchange. We obtained companies' financial information of 2005 from the KIS(Korea Information Service, Inc.). Using this data, we made multi-class rating with DEA efficiency and built multi-class prediction model based data mining. Among three manners of multi-classification, the hit ratio of the Weston and Watkins method is the best in the test data set. In multi classification problems as efficiency ratings of venture business, it is very useful for investors to know the class with errors, one class difference, when it is difficult to find out the accurate class in the actual market. So we presented accuracy results within 1-class errors, and the Weston and Watkins method showed 85.7% accuracy in our test samples. We conclude that the DEA based multi-class approach in venture business generates more information than the binary classification problem, notwithstanding its efficiency level. We believe this model can help investors in decision making as it provides a reliably tool to evaluate venture companies in the financial domain. For the future research, we perceive the need to enhance such areas as the variable selection process, the parameter selection of kernel function, the generalization, and the sample size of multi-class.
PDF KSCI

Trend and Further Research of Rice Quality Evaluation (쌀의 품질평가 현황과 금후 연구방향)

Son, Jong-Rok;Kim, Jae-Hyun;Lee, Jung-Il;Youn, Young-Hwan;Kim, Jae-Kyu;Hwang, Hung-Goo;Moon, Hun-Pal
- KOREAN JOURNAL OF CROP SCIENCE
- /
- v.47
- /
- pp.33-54
- /
- 2002
Rice quality is much dependent on the pre-and post harvest management. There are many parameters which influence rice or cooked rice qualitys such as cultivars, climate, soil, harvest time, drying, milling, storage, safety, nutritive value, taste, marketing, eating, cooking conditions, and each nations' food culture. Thus, vice evaluation might not be carried out by only some parameters. Physicochemical evaluation of rice deals with amy-lose content, gelatinizing property, and its relation with taste. The amylose content of good vice in Korea is defined at 17 to 20%. Other parameters considered are as follows; ratio of protein body-1 per total protein amount in relation to taste, and oleic/linoleic acid ratio in relation to storage safety. The rice higher Mg/K ratio is considered as high quality. The optimum value is over 1.5 to 1.6. It was reported that the contents of oligosaccharide, glutamic acid or its derivatives and its proportionalities have high corelation with the taste of rice. Major aromatic compounds in rice have been known as hexanal, acetone, pentanal, butanal, octanal, and heptanal. Recently, it was found that muco-polysaccharides are solubilized during cooking. Cooked rice surface is coated by the muco-polysaccharide. The muco-polysaccharide aye contributing to the consistency and collecting free amino acids and vitamins. Thus, these parameters might be regarded as important items for quality and taste evaluation of rice. Ingredients of rice related with the taste are not confined to the total rice grain. In the internal kernel, starch is main component but nitrogen and mineral compounds are localized at the external kernel. The ingredients related with taste are contained in 91 to 86% part of the outside kernel. For safety that is considered an important evaluation item of rice quality, each residual tolerance limit for agricultural chemicals must be adopted in our country. During drying, rice quality can decline by the reasons of high drying temperature, overdrying, and rapid drying. These result in cracked grain or decolored kernel. Intrinsic enzymes react partially during the rice storage. Because of these enzymes, starch, lipid, or protein can be slowly degraded, resulting in the decline of appearance quality, occurrence of aging aroma, and increased hardness of cooked rice. Milling conditions concerned with quality are paddy quality, milling method, and milling machines. To produce high quality rice, head rice must contain over three fourths of the normal rice kernels, and broken, damaged, colored, and immature kernels must be eliminated. In addition to milling equipment, color sorter and length grader must be installed for the production of such rice. Head rice was examined using the 45 brand rices circulating in Korea, Japan, America, Australia, and China. It was found that the head rice rate of brand rice in our country was approximately 57.4% and 80-86% in foreign countries. In order to develop a rice quality evaluation system, evaluation of technics must be further developed : more detailed measure of qualities, search for taste-related components, creation and grade classification of quality evaluation factors at each management stage of treatment after harvest, evaluation of rice as food material as well as for rice cooking, and method development for simple evaluation and establishment of equation for palatability. On policy concerns, the following must be conducted : development of price discrimination in conformity to rice cultivar and grade under the basis of quality evaluation method, fixation of head rice branding, and introduction of low temperature circulation.
PDF KSCI

Comparison and evaluation of treatment plans using Abdominal compression and Continuous Positive Air Pressure for lung cancer SABR (폐암의 SABR(Stereotactic Ablative Radiotherapy)시 복부압박(Abdominal compression)과 CPAP(Continuous Positive Air Pressure)를 이용한 치료계획의 비교 및 평가)

Kim, Dae Ho;Son, Sang Jun;Mun, Jun Ki;Park, Jang Pil;Lee, Je Hee
- The Journal of Korean Society for Radiation Therapy
- /
- v.33
- /
- pp.35-46
- /
- 2021
Purpose : By comparing and analyzing treatment plans using abdominal compression and The Continuous Positive Air Pressure(CPAP) during SABR of lung cancer, we try to contribute to the improvement of radiotherapy effect. Materials & Methods : In two of the lung SABR patients(A, B patient), we developed a SABR plan using abdominal compression device(the Body Pro-Lok, BPL) and CPAP and analyze the treatment plan through homogeneity, conformity and the parameters proposed in RTOG 0813. Furthermore, for each phase, the X, Y, and Z axis movements centered on PTV are analyzed in all 4D CTs and compared by obtaining the volume and average dose of PTV and OAR. Four cone beam computed tomography(CBCT) were used to measure the directions from the center of the PTV to the intrathoracic contacts in three directions out of 0°, 90°, 180° and 270°, and compare the differences from the average distance values in each direction. Result : Both treatment plans obtained using BPL and CPAP followed recommendations from RTOG, and there was no significant difference in homogeneity and conformity. The X-axis, Y-axis, and Z-axis movements centered on PTV in patient A were 0.49 cm, 0.37 cm, 1.66 cm with BPL and 0.16 cm, 0.12 cm, and 0.19 cm with CPAP, in patient B were 0.22 cm, 0.18 cm, 1.03 cm with BPL and 0.14 cm, 0.11 cm, and 0.4 cm with CPAP. In A patient, when using CPAP compared to BPL, ITV decreased by 46.27% and left lung volume increased by 41.94%, and average dose decreased by 52.81% in the heart. In B patient, volume increased by 106.89% in the left lung and 87.32% in the right lung, with an average dose decreased by 44.30% in the stomach. The maximum difference of A patient between the straight distance value and the mean distance value in each direction was 0.05 cm in the a-direction, 0.05 cm in the b-direction, and 0.41 cm in the c-direction. In B patient, there was a difference of 0.19 cm in the d-direction, 0.49 cm in the e-direction, and 0.06 cm in the f-direction. Conclusion : We confirm that increased lung volume with CPAP can reduce doses of OAR near the target more effectively than with BPL, and also contribute more effectively to restriction of tumor movement with respiration. It is considered that radiation therapy effects can be improved through the application of various sites of CPAP and the combination with CPAP and other treatment machines.
PDF KSCI

An Intelligence Support System Research on KTX Rolling Stock Failure Using Case-based Reasoning and Text Mining (사례기반추론과 텍스트마이닝 기법을 활용한 KTX 차량고장 지능형 조치지원시스템 연구)

Lee, Hyung Il;Kim, Jong Woo
- Journal of Intelligence and Information Systems
- /
- v.26 no.1
- /
- pp.47-73
- /
- 2020
KTX rolling stocks are a system consisting of several machines, electrical devices, and components. The maintenance of the rolling stocks requires considerable expertise and experience of maintenance workers. In the event of a rolling stock failure, the knowledge and experience of the maintainer will result in a difference in the quality of the time and work to solve the problem. So, the resulting availability of the vehicle will vary. Although problem solving is generally based on fault manuals, experienced and skilled professionals can quickly diagnose and take actions by applying personal know-how. Since this knowledge exists in a tacit form, it is difficult to pass it on completely to a successor, and there have been studies that have developed a case-based rolling stock expert system to turn it into a data-driven one. Nonetheless, research on the most commonly used KTX rolling stock on the main-line or the development of a system that extracts text meanings and searches for similar cases is still lacking. Therefore, this study proposes an intelligence supporting system that provides an action guide for emerging failures by using the know-how of these rolling stocks maintenance experts as an example of problem solving. For this purpose, the case base was constructed by collecting the rolling stocks failure data generated from 2015 to 2017, and the integrated dictionary was constructed separately through the case base to include the essential terminology and failure codes in consideration of the specialty of the railway rolling stock sector. Based on a deployed case base, a new failure was retrieved from past cases and the top three most similar failure cases were extracted to propose the actual actions of these cases as a diagnostic guide. In this study, various dimensionality reduction measures were applied to calculate similarity by taking into account the meaningful relationship of failure details in order to compensate for the limitations of the method of searching cases by keyword matching in rolling stock failure expert system studies using case-based reasoning in the precedent case-based expert system studies, and their usefulness was verified through experiments. Among the various dimensionality reduction techniques, similar cases were retrieved by applying three algorithms: Non-negative Matrix Factorization(NMF), Latent Semantic Analysis(LSA), and Doc2Vec to extract the characteristics of the failure and measure the cosine distance between the vectors. The precision, recall, and F-measure methods were used to assess the performance of the proposed actions. To compare the performance of dimensionality reduction techniques, the analysis of variance confirmed that the performance differences of the five algorithms were statistically significant, with a comparison between the algorithm that randomly extracts failure cases with identical failure codes and the algorithm that applies cosine similarity directly based on words. In addition, optimal techniques were derived for practical application by verifying differences in performance depending on the number of dimensions for dimensionality reduction. The analysis showed that the performance of the cosine similarity was higher than that of the dimension using Non-negative Matrix Factorization(NMF) and Latent Semantic Analysis(LSA) and the performance of algorithm using Doc2Vec was the highest. Furthermore, in terms of dimensionality reduction techniques, the larger the number of dimensions at the appropriate level, the better the performance was found. Through this study, we confirmed the usefulness of effective methods of extracting characteristics of data and converting unstructured data when applying case-based reasoning based on which most of the attributes are texted in the special field of KTX rolling stock. Text mining is a trend where studies are being conducted for use in many areas, but studies using such text data are still lacking in an environment where there are a number of specialized terms and limited access to data, such as the one we want to use in this study. In this regard, it is significant that the study first presented an intelligent diagnostic system that suggested action by searching for a case by applying text mining techniques to extract the characteristics of the failure to complement keyword-based case searches. It is expected that this will provide implications as basic study for developing diagnostic systems that can be used immediately on the site.
https://doi.org/10.13088/jiis.2020.26.1.047 인용 PDF KSCI

Ensemble Learning with Support Vector Machines for Bond Rating (회사채 신용등급 예측을 위한 SVM 앙상블학습)

Kim, Myoung-Jong
- Journal of Intelligence and Information Systems
- /
- v.18 no.2
- /
- pp.29-45
- /
- 2012
Bond rating is regarded as an important event for measuring financial risk of companies and for determining the investment returns of investors. As a result, it has been a popular research topic for researchers to predict companies' credit ratings by applying statistical and machine learning techniques. The statistical techniques, including multiple regression, multiple discriminant analysis (MDA), logistic models (LOGIT), and probit analysis, have been traditionally used in bond rating. However, one major drawback is that it should be based on strict assumptions. Such strict assumptions include linearity, normality, independence among predictor variables and pre-existing functional forms relating the criterion variablesand the predictor variables. Those strict assumptions of traditional statistics have limited their application to the real world. Machine learning techniques also used in bond rating prediction models include decision trees (DT), neural networks (NN), and Support Vector Machine (SVM). Especially, SVM is recognized as a new and promising classification and regression analysis method. SVM learns a separating hyperplane that can maximize the margin between two categories. SVM is simple enough to be analyzed mathematical, and leads to high performance in practical applications. SVM implements the structuralrisk minimization principle and searches to minimize an upper bound of the generalization error. In addition, the solution of SVM may be a global optimum and thus, overfitting is unlikely to occur with SVM. In addition, SVM does not require too many data sample for training since it builds prediction models by only using some representative sample near the boundaries called support vectors. A number of experimental researches have indicated that SVM has been successfully applied in a variety of pattern recognition fields. However, there are three major drawbacks that can be potential causes for degrading SVM's performance. First, SVM is originally proposed for solving binary-class classification problems. Methods for combining SVMs for multi-class classification such as One-Against-One, One-Against-All have been proposed, but they do not improve the performance in multi-class classification problem as much as SVM for binary-class classification. Second, approximation algorithms (e.g. decomposition methods, sequential minimal optimization algorithm) could be used for effective multi-class computation to reduce computation time, but it could deteriorate classification performance. Third, the difficulty in multi-class prediction problems is in data imbalance problem that can occur when the number of instances in one class greatly outnumbers the number of instances in the other class. Such data sets often cause a default classifier to be built due to skewed boundary and thus the reduction in the classification accuracy of such a classifier. SVM ensemble learning is one of machine learning methods to cope with the above drawbacks. Ensemble learning is a method for improving the performance of classification and prediction algorithms. AdaBoost is one of the widely used ensemble learning techniques. It constructs a composite classifier by sequentially training classifiers while increasing weight on the misclassified observations through iterations. The observations that are incorrectly predicted by previous classifiers are chosen more often than examples that are correctly predicted. Thus Boosting attempts to produce new classifiers that are better able to predict examples for which the current ensemble's performance is poor. In this way, it can reinforce the training of the misclassified observations of the minority class. This paper proposes a multiclass Geometric Mean-based Boosting (MGM-Boost) to resolve multiclass prediction problem. Since MGM-Boost introduces the notion of geometric mean into AdaBoost, it can perform learning process considering the geometric mean-based accuracy and errors of multiclass. This study applies MGM-Boost to the real-world bond rating case for Korean companies to examine the feasibility of MGM-Boost. 10-fold cross validations for threetimes with different random seeds are performed in order to ensure that the comparison among three different classifiers does not happen by chance. For each of 10-fold cross validation, the entire data set is first partitioned into tenequal-sized sets, and then each set is in turn used as the test set while the classifier trains on the other nine sets. That is, cross-validated folds have been tested independently of each algorithm. Through these steps, we have obtained the results for classifiers on each of the 30 experiments. In the comparison of arithmetic mean-based prediction accuracy between individual classifiers, MGM-Boost (52.95%) shows higher prediction accuracy than both AdaBoost (51.69%) and SVM (49.47%). MGM-Boost (28.12%) also shows the higher prediction accuracy than AdaBoost (24.65%) and SVM (15.42%)in terms of geometric mean-based prediction accuracy. T-test is used to examine whether the performance of each classifiers for 30 folds is significantly different. The results indicate that performance of MGM-Boost is significantly different from AdaBoost and SVM classifiers at 1% level. These results mean that MGM-Boost can provide robust and stable solutions to multi-classproblems such as bond rating.
https://doi.org/10.13088/jiis.2012.18.2.029 인용 PDF KSCI

Evaluation of Halcyon^TM Fast kV CBCT effectiveness in radiation therapy in cervical cancer patients of childbearing age who performed ovarian transposition (난소전위술을 시행한 가임기 여성의 자궁경부암 방사선치료 시 난소선량 감소를 위한 Halcyon^TM Fast kV CBCT의 유용성 평가 : Phantom study)

Lee Sung Jae;Shin Chung Hun;Choi So Young;Lee Dong Hyeong;Yoo Soon Mi;Song Heung Gwon;Yoon In Ha
- The Journal of Korean Society for Radiation Therapy
- /
- v.34
- /
- pp.73-82
- /
- 2022
Purpose: The purpose of this study is to evaluate the effectiveness of reducing the absorbed dose to the ovaries and the quality of the CBCT image when using the Halcyon^TM Fast kV CBCT of cervical cancer patients of child-bearing age who performed ovarian transposition Materials and Methods : Contouring of the cervix and ovaries required for measurement was performed on the computed tomography images of the human phantom (Alderson Rando Phantom, USA), and three Optically Stimulated Luminescence Dosimeter(OSLD) were attached to the selected organ cross-section, respectively. In order to measure the absorbed dose to the cervix and ovaries in the Truebeam^TM pelvis mode (Hereinafter referred to as TP), The Halcyon^TM Pelvis mode (Hereinafter referred to as HP) and The Halcyon^TM Pelvis Fast mode (Hereinafter referred to as HPF), An image was taken with a scan range of 17.5 cm and also taken an image that reduced the Scan range to 12.5cm. A total of 10 cumulative doses were summed, It was replaced with a value of 23 Fx, the number of cervical cancer treatments, and compared In additon, uniformity, low contrast visibility, spatial resolution, and geometric distortion were compared and analyzed using Catphan 504 phantom to compare CBCT image quality between equipment. Each factor was repeatedly measured three times, and the average value was obtained by analysing with the Doselab (Mobius Medical Systems, LP. Versions: 6.8) program. Results: As a result of measuring absorbed dose by CBCT with OSLD, TP and HP did not obtain significant results under the same conditions. The mode showing the greatest reduction value was HPF versus TP. In HPF, the absorbed dose was reduced by 39.8% in the cervix and 19.8% in the ovary compared to the TP in the scan range of 17.5 cm. the scan range was reduced to 12.5 cm, absorbed dose was reduced by 34.2% in the cervix and 50.5% in the ovary. In addition, result of evaluating the quality of the image used in the above experiment, it complied with the equipment manufacturer's standards with Geometric Distortion within 1mm (SBRT standard), Uniformity HU, LCV within 2.0%, Spatial Resolution more than 3 lp/mm. Conclusion: According to the results of this experiment, Halcyon^TM can select more various conditions than Truebeam^TM in treatment of fertility woman who have undergone ovarian Transposition , because it is important to reduce the radiation dose by CBCT during radiation therapy. So finally we recommend Halcyon^TM Fast kV CBCT which maintains image quality even at low mAs. However, it is consider that the additional exposure to low doses can be reduced by controlling the imaging range for patients who have undergone ovarian transposition in other treatment machines.
PDF KSCI

Robo-Advisor Algorithm with Intelligent View Model (지능형 전망모형을 결합한 로보어드바이저 알고리즘)

Kim, Sunwoong
- Journal of Intelligence and Information Systems
- /
- v.25 no.2
- /
- pp.39-55
- /
- 2019
Recently banks and large financial institutions have introduced lots of Robo-Advisor products. Robo-Advisor is a Robot to produce the optimal asset allocation portfolio for investors by using the financial engineering algorithms without any human intervention. Since the first introduction in Wall Street in 2008, the market size has grown to 60 billion dollars and is expected to expand to 2,000 billion dollars by 2020. Since Robo-Advisor algorithms suggest asset allocation output to investors, mathematical or statistical asset allocation strategies are applied. Mean variance optimization model developed by Markowitz is the typical asset allocation model. The model is a simple but quite intuitive portfolio strategy. For example, assets are allocated in order to minimize the risk on the portfolio while maximizing the expected return on the portfolio using optimization techniques. Despite its theoretical background, both academics and practitioners find that the standard mean variance optimization portfolio is very sensitive to the expected returns calculated by past price data. Corner solutions are often found to be allocated only to a few assets. The Black-Litterman Optimization model overcomes these problems by choosing a neutral Capital Asset Pricing Model equilibrium point. Implied equilibrium returns of each asset are derived from equilibrium market portfolio through reverse optimization. The Black-Litterman model uses a Bayesian approach to combine the subjective views on the price forecast of one or more assets with implied equilibrium returns, resulting a new estimates of risk and expected returns. These new estimates can produce optimal portfolio by the well-known Markowitz mean-variance optimization algorithm. If the investor does not have any views on his asset classes, the Black-Litterman optimization model produce the same portfolio as the market portfolio. What if the subjective views are incorrect? A survey on reports of stocks performance recommended by securities analysts show very poor results. Therefore the incorrect views combined with implied equilibrium returns may produce very poor portfolio output to the Black-Litterman model users. This paper suggests an objective investor views model based on Support Vector Machines(SVM), which have showed good performance results in stock price forecasting. SVM is a discriminative classifier defined by a separating hyper plane. The linear, radial basis and polynomial kernel functions are used to learn the hyper planes. Input variables for the SVM are returns, standard deviations, Stochastics %K and price parity degree for each asset class. SVM output returns expected stock price movements and their probabilities, which are used as input variables in the intelligent views model. The stock price movements are categorized by three phases; down, neutral and up. The expected stock returns make P matrix and their probability results are used in Q matrix. Implied equilibrium returns vector is combined with the intelligent views matrix, resulting the Black-Litterman optimal portfolio. For comparisons, Markowitz mean-variance optimization model and risk parity model are used. The value weighted market portfolio and equal weighted market portfolio are used as benchmark indexes. We collect the 8 KOSPI 200 sector indexes from January 2008 to December 2018 including 132 monthly index values. Training period is from 2008 to 2015 and testing period is from 2016 to 2018. Our suggested intelligent view model combined with implied equilibrium returns produced the optimal Black-Litterman portfolio. The out of sample period portfolio showed better performance compared with the well-known Markowitz mean-variance optimization portfolio, risk parity portfolio and market portfolio. The total return from 3 year-period Black-Litterman portfolio records 6.4%, which is the highest value. The maximum draw down is -20.8%, which is also the lowest value. Sharpe Ratio shows the highest value, 0.17. It measures the return to risk ratio. Overall, our suggested view model shows the possibility of replacing subjective analysts's views with objective view model for practitioners to apply the Robo-Advisor asset allocation algorithms in the real trading fields.
https://doi.org/10.13088/jiis.2019.25.2.039 인용 PDF KSCI HTML

A Study on the Effect of Network Centralities on Recommendation Performance (네트워크 중심성 척도가 추천 성능에 미치는 영향에 대한 연구)

Lee, Dongwon
- Journal of Intelligence and Information Systems
- /
- v.27 no.1
- /
- pp.23-46
- /
- 2021
Collaborative filtering, which is often used in personalization recommendations, is recognized as a very useful technique to find similar customers and recommend products to them based on their purchase history. However, the traditional collaborative filtering technique has raised the question of having difficulty calculating the similarity for new customers or products due to the method of calculating similaritiesbased on direct connections and common features among customers. For this reason, a hybrid technique was designed to use content-based filtering techniques together. On the one hand, efforts have been made to solve these problems by applying the structural characteristics of social networks. This applies a method of indirectly calculating similarities through their similar customers placed between them. This means creating a customer's network based on purchasing data and calculating the similarity between the two based on the features of the network that indirectly connects the two customers within this network. Such similarity can be used as a measure to predict whether the target customer accepts recommendations. The centrality metrics of networks can be utilized for the calculation of these similarities. Different centrality metrics have important implications in that they may have different effects on recommended performance. In this study, furthermore, the effect of these centrality metrics on the performance of recommendation may vary depending on recommender algorithms. In addition, recommendation techniques using network analysis can be expected to contribute to increasing recommendation performance even if they apply not only to new customers or products but also to entire customers or products. By considering a customer's purchase of an item as a link generated between the customer and the item on the network, the prediction of user acceptance of recommendation is solved as a prediction of whether a new link will be created between them. As the classification models fit the purpose of solving the binary problem of whether the link is engaged or not, decision tree, k-nearest neighbors (KNN), logistic regression, artificial neural network, and support vector machine (SVM) are selected in the research. The data for performance evaluation used order data collected from an online shopping mall over four years and two months. Among them, the previous three years and eight months constitute social networks composed of and the experiment was conducted by organizing the data collected into the social network. The next four months' records were used to train and evaluate recommender models. Experiments with the centrality metrics applied to each model show that the recommendation acceptance rates of the centrality metrics are different for each algorithm at a meaningful level. In this work, we analyzed only four commonly used centrality metrics: degree centrality, betweenness centrality, closeness centrality, and eigenvector centrality. Eigenvector centrality records the lowest performance in all models except support vector machines. Closeness centrality and betweenness centrality show similar performance across all models. Degree centrality ranking moderate across overall models while betweenness centrality always ranking higher than degree centrality. Finally, closeness centrality is characterized by distinct differences in performance according to the model. It ranks first in logistic regression, artificial neural network, and decision tree withnumerically high performance. However, it only records very low rankings in support vector machine and K-neighborhood with low-performance levels. As the experiment results reveal, in a classification model, network centrality metrics over a subnetwork that connects the two nodes can effectively predict the connectivity between two nodes in a social network. Furthermore, each metric has a different performance depending on the classification model type. This result implies that choosing appropriate metrics for each algorithm can lead to achieving higher recommendation performance. In general, betweenness centrality can guarantee a high level of performance in any model. It would be possible to consider the introduction of proximity centrality to obtain higher performance for certain models.
https://doi.org/10.13088/jiis.2021.27.1.023 인용 PDF KSCI

Search Result 458, Processing Time 0.037 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)