• Title/Summary/Keyword: order prediction

Search Result 3,933, Processing Time 0.034 seconds

Construction of Web-Based Database for Anisakis Research (고래회충 연구를 위한 웹기반 데이터베이스 구축)

  • Lee, Yong-Seok;Baek, Moon-Ki;Jo, Yong-Hun;Kang, Se-Won;Lee, Jae-Bong;Han, Yeon-Soo;Cha, Hee-Jae;Yu, Hak-Sun;Ock, Mee-Sun
    • Journal of Life Science
    • /
    • v.20 no.3
    • /
    • pp.411-415
    • /
    • 2010
  • Anisakis simplex is one of the parasitic nematodes, and has a complex life cycle in crustaceans, fish, squid or whale. When people eat under-processed or raw fish, it causes anisakidosis and also plays a critical role in inducing serious allergic reactions in humans. However, no web-based database on A. simplex at the level of DNA or protein has been so far reported. In this context, we constructed a web-based database for Anisakis research. To build up the web-based database for Anisakis research, we proceeded with the following measures: First, sequences of order Ascaridida were downloaded and translated into the multifasta format which was stored as database for stand-alone BLAST. Second, all of the nucleotide and EST sequences were clustered and assembled. And EST sequences were translated into amino acid sequences for Nuclear Localization Signal prediction. In addition, we added the vector, E. coli, and repeat sequences into the database to confirm a potential contamination. The web-based database gave us several advantages. Only data that agrees with the nucleotide sequences directly related with the order Ascaridida can be found and retrieved when searching BLAST. It is also very convenient to confirm contamination when making the cDNA or genomic library from Anisakis. Furthermore, BLAST results on the Anisakis sequence information can be quickly accessed. Taken together, the Web-based database on A. simplex will be valuable in developing species specific PCR markers and in studying SNP in A. simplex-related researches in the future.

A STUDY ON THE MEASUREMENT OF THE IMPLANT STABILITY USING RESONANCE FREQUENCY ANALYSIS (공진 주파수 분석법에 의한 임플랜트의 안정성 측정에 관한 연구)

  • Park Cheol;Lim Ju-Hwan;Cho In-Ho;Lim Heon-Song
    • The Journal of Korean Academy of Prosthodontics
    • /
    • v.41 no.2
    • /
    • pp.182-206
    • /
    • 2003
  • Statement of problem : Successful osseointegration of endosseous threaded implants is dependent on many factors. These may include the surface characteristics and gross geometry of implants, the quality and quantity of bone where implants are placed, and the magnitude and direction of stress in functional occlusion. Therefore clinical quantitative measurement of primary stability at placement and functional state of implant may play a role in prediction of possible clinical symptoms and the renovation of implant geometry, types and surface characteristic according to each patients conditions. Ultimately, it may increase success rate of implants. Purpose : Many available non-invasive techniques used for the clinical measurement of implant stability and osseointegration include percussion, radiography, the $Periotest^{(R)}$, Dental Fine $Tester^{(R)}$ and so on. There is, however, relatively little research undertaken to standardize quantitative measurement of stability of implant and osseointegration due to the various clinical applications performed by each individual operator. Therefore, in order to develop non-invasive experimental method to measure stability of implant quantitatively, the resonance frequency analyzer to measure the natural frequency of specific substance was developed in the procedure of this study. Material & method : To test the stability of the resonance frequency analyzer developed in this study, following methods and materials were used : 1) In-vitro study: the implant was placed in both epoxy resin of which physical properties are similar to the bone stiffness of human and fresh cow rib bone specimen. Then the resonance frequency values of them were measured and analyzed. In an attempt to test the reliability of the data gathered with the resonance frequency analyzer, comparative analysis with the data from the Periotest was conducted. 2) In-vivo study: the implants were inserted into the tibiae of 10 New Zealand rabbits and the resonance frequency value of them with connected abutments at healing time are measured immediately after insertion and gauged every 4 weeks for 16 weeks. Results : Results from these studies were such as follows : The same length implants placed in Hot Melt showed the repetitive resonance frequency values. As the length of abutment increased, the resonance frequency value changed significantly (p<0.01). As the thickness of transducer increased in order of 0.5, 1.0 and 2.0 mm, the resonance frequency value significantly increased (p<0.05). The implants placed in PL-2 and epoxy resin with different exposure degree resulted in the increase of resonance frequency value as the exposure degree of implants and the length of abutment decreased. In comparative experiment based on physical properties, as the thickness of transducer increased, the resonance frequency value increased significantly(p<0.01). As the stiffness of substances where implants were placed increased, and the effective length of implants decreased, the resonance frequencies value increased significantly (p<0.05). In the experiment with cow rib bone specimen, the increase of the length of abutment resulted in significant difference between the results from resonance frequency analyzer and the $Periotest^{(R)}$. There was no difference with significant meaning in the comparison based on the direction of measurement between the resonance frequency value and the $Periotest^{(R)}$ value (p<0.05). In-vivo experiment resulted in repetitive patternes of resonance frequency. As the time elapsed, the resonance frequency value increased significantly with the exception of 4th and 8th week (p<0.05). Conclusion : The development of resonance frequency analyzer is an attempt to standardize the quantitative measurement of stability of implant and osseointegration and compensate for the reliability of data from other non-invasive measuring devices It is considered that further research is needed to improve the efficiency of clinical application of resonance frequency analyzer. In addition, further investigation is warranted on the standardized quantitative analysis of the stability of implant.

Change Prediction of Future Forestland Area by Transition of Land Use Types in South Korea (로지스틱 회귀모형을 이용한 우리나라 산지면적의 공간변화 예측에 관한 연구)

  • KWAK, Doo-Ahn;PARK, So-Hee
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.24 no.4
    • /
    • pp.99-112
    • /
    • 2021
  • This study was performed to predict spatial change of future forestland area in South Korea at regional level for supporting forest-related plans established by local governments. In the study, land use was classified to three types which are forestland, agricultural land, and urban and other lands. A logistic regression model was developed using transitional interaction between each land use type and topographical factors, land use restriction factors, socioeconomic indices, and development infrastructures. In this model, change probability from a target land use type to other land use types was estimated using raster dataset(30m×30m) for each variable. With priority order map based on the probability of land use change, the total annual amount of land use change was allocated to the cells in the order of the highest transition potential for the spatial analysis. In results, it was found that slope degree and slope standard value by the local government were the main factors affecting the probability of change from forestland to urban and other land. Also, forestland was more likely to change to urban and other land in the conditions of a more gentle slope, lower slope criterion allowed to developed, and higher land price and population density. Consequently, it was predicted that forestland area would decrease by 2027 due to the change from forestland to urban and others, especially in metropolitan and major cities, and that forestland area would increase between 2028 and 2050 in the most local provincial cities except Seoul, Gyeonggi-do, and Jeju Island due to locality extinction with decline in population. Thus, local government is required to set an adequate forestland use criterion for balanced development, reasonable use and conservation, and to establish the regional forest strategies and policies considering the future land use change trends.

Factor Analysis Affecting on Chartering Decision-making in the Dry Bulk Shipping Market (부정기 건화물선 시장에서 용선 의사결정에 영향을 미치는 요인 분석)

  • Lee, Choong-Ho;Park, Keun-Sik
    • Journal of Korea Port Economic Association
    • /
    • v.40 no.1
    • /
    • pp.151-163
    • /
    • 2024
  • This study sought to confirm the impact of analytical methods and behavioral economic theory factors on decision-making when making chartering decisions in the dry bulk shipping market. This study on chartering decision-making model was began to verify why shipping companies do not make rational decision-making and behavior based on analytical methods such as freight prediction and process of alternative selection in the same market situation. To understand the chartering decision-making model, it is necessary to study the impact of behavioral economic theory such as heuristics, loss aversion, and herding behavior on chartering decision-making. Through AHP analysis, the importance of the method factors relied upon in chartering decision-making. The dependence of the top factors in chartering decision-making was in the following order: market factors, heuristics, internal factors, herding behavior, and loss aversion. Market factors, heuristics, and internal factors. As for detailed factors, spot freight index and empirical intuition were confirmed as the most important factors relied on when making decisions. It was confirmed that empirical intuition is more important than internal analysis, which is an analytical method. This study can be said to be meaningful in that it academically researched and proved the bounded rationality of humans, which cannot be fully rational, and sometimes relies on experience or psychological tendencies, by applying it to the chartering decision-making model in the dry bulk shipping market. It also suggests that in the dry bulk shipping market, which is uncertain and has a high risk of loss due to decision-making, the experience and insight of decision makers have a very important impact on the performance and business profits of the operation part of shipping companies. Even though chartering are a decision-making field that requires judgment and intuition based on heuristics, decision-makers need to be aware of this decision-making model in order to reduce repeated mistakes of deciding contrary to market situation. It also suggests that there is a need to internally research analytical methods and procedures that can complement heuristics such as empirical intuition.

Impact of Semantic Characteristics on Perceived Helpfulness of Online Reviews (온라인 상품평의 내용적 특성이 소비자의 인지된 유용성에 미치는 영향)

  • Park, Yoon-Joo;Kim, Kyoung-jae
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.29-44
    • /
    • 2017
  • In Internet commerce, consumers are heavily influenced by product reviews written by other users who have already purchased the product. However, as the product reviews accumulate, it takes a lot of time and effort for consumers to individually check the massive number of product reviews. Moreover, product reviews that are written carelessly actually inconvenience consumers. Thus many online vendors provide mechanisms to identify reviews that customers perceive as most helpful (Cao et al. 2011; Mudambi and Schuff 2010). For example, some online retailers, such as Amazon.com and TripAdvisor, allow users to rate the helpfulness of each review, and use this feedback information to rank and re-order them. However, many reviews have only a few feedbacks or no feedback at all, thus making it hard to identify their helpfulness. Also, it takes time to accumulate feedbacks, thus the newly authored reviews do not have enough ones. For example, only 20% of the reviews in Amazon Review Dataset (Mcauley and Leskovec, 2013) have more than 5 reviews (Yan et al, 2014). The purpose of this study is to analyze the factors affecting the usefulness of online product reviews and to derive a forecasting model that selectively provides product reviews that can be helpful to consumers. In order to do this, we extracted the various linguistic, psychological, and perceptual elements included in product reviews by using text-mining techniques and identifying the determinants among these elements that affect the usability of product reviews. In particular, considering that the characteristics of the product reviews and determinants of usability for apparel products (which are experiential products) and electronic products (which are search goods) can differ, the characteristics of the product reviews were compared within each product group and the determinants were established for each. This study used 7,498 apparel product reviews and 106,962 electronic product reviews from Amazon.com. In order to understand a review text, we first extract linguistic and psychological characteristics from review texts such as a word count, the level of emotional tone and analytical thinking embedded in review text using widely adopted text analysis software LIWC (Linguistic Inquiry and Word Count). After then, we explore the descriptive statistics of review text for each category and statistically compare their differences using t-test. Lastly, we regression analysis using the data mining software RapidMiner to find out determinant factors. As a result of comparing and analyzing product review characteristics of electronic products and apparel products, it was found that reviewers used more words as well as longer sentences when writing product reviews for electronic products. As for the content characteristics of the product reviews, it was found that these reviews included many analytic words, carried more clout, and related to the cognitive processes (CogProc) more so than the apparel product reviews, in addition to including many words expressing negative emotions (NegEmo). On the other hand, the apparel product reviews included more personal, authentic, positive emotions (PosEmo) and perceptual processes (Percept) compared to the electronic product reviews. Next, we analyzed the determinants toward the usefulness of the product reviews between the two product groups. As a result, it was found that product reviews with high product ratings from reviewers in both product groups that were perceived as being useful contained a larger number of total words, many expressions involving perceptual processes, and fewer negative emotions. In addition, apparel product reviews with a large number of comparative expressions, a low expertise index, and concise content with fewer words in each sentence were perceived to be useful. In the case of electronic product reviews, those that were analytical with a high expertise index, along with containing many authentic expressions, cognitive processes, and positive emotions (PosEmo) were perceived to be useful. These findings are expected to help consumers effectively identify useful product reviews in the future.

Decomposition Characteristics of Fungicides(Benomyl) using a Design of Experiment(DOE) in an E-beam Process and Acute Toxicity Assessment (전자빔 공정에서 실험계획법을 이용한 살균제 Benomyl의 제거특성 및 독성평가)

  • Yu, Seung-Ho;Cho, Il-Hyoung;Chang, Soon-Woong;Lee, Si-Jin;Chun, Suk-Young;Kim, Han-Lae
    • Journal of Korean Society of Environmental Engineers
    • /
    • v.30 no.9
    • /
    • pp.955-960
    • /
    • 2008
  • We investigated and estimated at the characteristics of decomposition and mineralization of benomyl using a design of experiment(DOE) based on the general factorial design in an E-beam process, and also the main factors(variables) with benomyl concentration(X$_1$) and E-beam irradiation(X$_2$) which consisted of 5 levels in each factor was set up to estimate the prediction model and the optimization conditions. At frist, the benomyl in all treatment combinations except 17 and 18 trials was almost degraded and the difference in the decomposition of benomyl in the 3 blocks was not significant(p > 0.05, one-way ANOVA). However, the % of benomyl mineralization was 46%(block 1), 36.7%(block 2) and 22%(block 3) and showed the significant difference of the % that between each block(p < 0.05). The linear regression equations of benomyl mineralization in each block were also estimated as followed; block 1(Y$_1$ = 0.024X$_1$ + 34.1(R$^2$ = 0.929)), block 2(Y$_2$ = 0.026X$_2$ + 23.1(R$^2$ = 0.976)) and block 3(Y$_3$ = 0.034X$_3$ + 6.2(R$^2$ = 0.98)). The normality of benomyl mineralization obtained from Anderson-Darling test in all treatment conditions was satisfied(p > 0.05). The results of prediction model and optimization point using the canonical analysis in order to obtain the optimal operation conditions were Y = 39.96 - 9.36X$_1$ + 0.03X$_2$ - 10.67X$_1{^2}$ - 0.001X$_2{^2}$ + 0.011X$_1$X$_2$(R$^2$ = 96.3%, Adjusted R$^2$ = 94.8%) and 57.3% at 0.55 mg/L and 950 Gy, respectively. A Microtox test using V. fischeri showed that the toxicity, expressed as the inhibition(%), was reduced almost completely after an E-beam irradiation, whereas the inhibition(%) for 0.5 mg/L, 1 mg/L and 1.5 mg/L was 10.25%, 20.14% and 26.2% in the initial reactions in the absence of an E-beam illumination.

A study on the Degradation and By-products Formation of NDMA by the Photolysis with UV: Setup of Reaction Models and Assessment of Decomposition Characteristics by the Statistical Design of Experiment (DOE) based on the Box-Behnken Technique (UV 공정을 이용한 N-Nitrosodimethylamine (NDMA) 광분해 및 부산물 생성에 관한 연구: 박스-벤켄법 실험계획법을 이용한 통계학적 분해특성평가 및 반응모델 수립)

  • Chang, Soon-Woong;Lee, Si-Jin;Cho, Il-Hyoung
    • Journal of Korean Society of Environmental Engineers
    • /
    • v.32 no.1
    • /
    • pp.33-46
    • /
    • 2010
  • We investigated and estimated at the characteristics of decomposition and by-products of N-Nitrosodimethylamine (NDMA) using a design of experiment (DOE) based on the Box-Behken design in an UV process, and also the main factors (variables) with UV intensity($X_2$) (range: $1.5{\sim}4.5\;mW/cm^2$), NDMA concentration ($X_2$) (range: 100~300 uM) and pH ($X_2$) (rang: 3~9) which consisted of 3 levels in each factor and 4 responses ($Y_1$ (% of NDMA removal), $Y_2$ (dimethylamine (DMA) reformation (uM)), $Y_3$ (dimethylformamide (DMF) reformation (uM), $Y_4$ ($NO_2$-N reformation (uM)) were set up to estimate the prediction model and the optimization conditions. The results of prediction model and optimization point using the canonical analysis in order to obtain the optimal operation conditions were $Y_1$ [% of NDMA removal] = $117+21X_1-0.3X_2-17.2X_3+{2.43X_1}^2+{0.001X_2}^2+{3.2X_3}^2-0.08X_1X_2-1.6X_1X_3-0.05X_2X_3$ ($R^2$= 96%, Adjusted $R^2$ = 88%) and 99.3% ($X_1:\;4.5\;mW/cm^2$, $X_2:\;190\;uM$, $X_3:\;3.2$), $Y_2$ [DMA conc] = $-101+18.5X_1+0.4X_2+21X_3-{3.3X_1}^2-{0.01X_2}^2-{1.5X_3}^2-0.01X_1X_2+0.07X_1X_3-0.01X_2X_3$ ($R^2$= 99.4%, 수정 $R^2$ = 95.7%) and 35.2 uM ($X_1$: 3 $mW/cm^2$, $X_2$: 220 uM, $X_3$: 6.3), $Y_3$ [DMF conc] = $-6.2+0.2X_1+0.02X_2+2X_3-0.26X_1^2-0.01X_2^2-0.2X_3^2-0.004X_1X_2+0.1X_1X_3-0.02X_2X_3$ ($R^2$= 98%, Adjusted $R^2$ = 94.4%) and 3.7 uM ($X_1:\;4.5\;$mW/cm^2$, $X_2:\;290\;uM$, $X_3:\;6.2$) and $Y_4$ [$NO_2$-N conc] = $-25+12.2X_1+0.15X_2+7.8X_3+{1.1X_1}^2+{0.001X_2}^2-{0.34X_3}^2+0.01X_1X_2+0.08X_1X_3-3.4X_2X_3$ ($R^2$= 98.5%, Adjusted $R^2$ = 95.7%) and 74.5 uM ($X_1:\;4.5\;mW/cm^2$, $X_2:\;220\;uM$, $X_3:\;3.1$). This study has demonstrated that the response surface methodology and the Box-Behnken statistical experiment design can provide statistically reliable results for decomposition and by-products of NDMA by the UV photolysis and also for determination of optimum conditions. Predictions obtained from the response functions were in good agreement with the experimental results indicating the reliability of the methodology used.

The Classification System and Information Service for Establishing a National Collaborative R&D Strategy in Infectious Diseases: Focusing on the Classification Model for Overseas Coronavirus R&D Projects (국가 감염병 공동R&D전략 수립을 위한 분류체계 및 정보서비스에 대한 연구: 해외 코로나바이러스 R&D과제의 분류모델을 중심으로)

  • Lee, Doyeon;Lee, Jae-Seong;Jun, Seung-pyo;Kim, Keun-Hwan
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.3
    • /
    • pp.127-147
    • /
    • 2020
  • The world is suffering from numerous human and economic losses due to the novel coronavirus infection (COVID-19). The Korean government established a strategy to overcome the national infectious disease crisis through research and development. It is difficult to find distinctive features and changes in a specific R&D field when using the existing technical classification or science and technology standard classification. Recently, a few studies have been conducted to establish a classification system to provide information about the investment research areas of infectious diseases in Korea through a comparative analysis of Korea government-funded research projects. However, these studies did not provide the necessary information for establishing cooperative research strategies among countries in the infectious diseases, which is required as an execution plan to achieve the goals of national health security and fostering new growth industries. Therefore, it is inevitable to study information services based on the classification system and classification model for establishing a national collaborative R&D strategy. Seven classification - Diagnosis_biomarker, Drug_discovery, Epidemiology, Evaluation_validation, Mechanism_signaling pathway, Prediction, and Vaccine_therapeutic antibody - systems were derived through reviewing infectious diseases-related national-funded research projects of South Korea. A classification system model was trained by combining Scopus data with a bidirectional RNN model. The classification performance of the final model secured robustness with an accuracy of over 90%. In order to conduct the empirical study, an infectious disease classification system was applied to the coronavirus-related research and development projects of major countries such as the STAR Metrics (National Institutes of Health) and NSF (National Science Foundation) of the United States(US), the CORDIS (Community Research & Development Information Service)of the European Union(EU), and the KAKEN (Database of Grants-in-Aid for Scientific Research) of Japan. It can be seen that the research and development trends of infectious diseases (coronavirus) in major countries are mostly concentrated in the prediction that deals with predicting success for clinical trials at the new drug development stage or predicting toxicity that causes side effects. The intriguing result is that for all of these nations, the portion of national investment in the vaccine_therapeutic antibody, which is recognized as an area of research and development aimed at the development of vaccines and treatments, was also very small (5.1%). It indirectly explained the reason of the poor development of vaccines and treatments. Based on the result of examining the investment status of coronavirus-related research projects through comparative analysis by country, it was found that the US and Japan are relatively evenly investing in all infectious diseases-related research areas, while Europe has relatively large investments in specific research areas such as diagnosis_biomarker. Moreover, the information on major coronavirus-related research organizations in major countries was provided by the classification system, thereby allowing establishing an international collaborative R&D projects.

A Study on Knowledge Entity Extraction Method for Individual Stocks Based on Neural Tensor Network (뉴럴 텐서 네트워크 기반 주식 개별종목 지식개체명 추출 방법에 관한 연구)

  • Yang, Yunseok;Lee, Hyun Jun;Oh, Kyong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.2
    • /
    • pp.25-38
    • /
    • 2019
  • Selecting high-quality information that meets the interests and needs of users among the overflowing contents is becoming more important as the generation continues. In the flood of information, efforts to reflect the intention of the user in the search result better are being tried, rather than recognizing the information request as a simple string. Also, large IT companies such as Google and Microsoft focus on developing knowledge-based technologies including search engines which provide users with satisfaction and convenience. Especially, the finance is one of the fields expected to have the usefulness and potential of text data analysis because it's constantly generating new information, and the earlier the information is, the more valuable it is. Automatic knowledge extraction can be effective in areas where information flow is vast, such as financial sector, and new information continues to emerge. However, there are several practical difficulties faced by automatic knowledge extraction. First, there are difficulties in making corpus from different fields with same algorithm, and it is difficult to extract good quality triple. Second, it becomes more difficult to produce labeled text data by people if the extent and scope of knowledge increases and patterns are constantly updated. Third, performance evaluation is difficult due to the characteristics of unsupervised learning. Finally, problem definition for automatic knowledge extraction is not easy because of ambiguous conceptual characteristics of knowledge. So, in order to overcome limits described above and improve the semantic performance of stock-related information searching, this study attempts to extract the knowledge entity by using neural tensor network and evaluate the performance of them. Different from other references, the purpose of this study is to extract knowledge entity which is related to individual stock items. Various but relatively simple data processing methods are applied in the presented model to solve the problems of previous researches and to enhance the effectiveness of the model. From these processes, this study has the following three significances. First, A practical and simple automatic knowledge extraction method that can be applied. Second, the possibility of performance evaluation is presented through simple problem definition. Finally, the expressiveness of the knowledge increased by generating input data on a sentence basis without complex morphological analysis. The results of the empirical analysis and objective performance evaluation method are also presented. The empirical study to confirm the usefulness of the presented model, experts' reports about individual 30 stocks which are top 30 items based on frequency of publication from May 30, 2017 to May 21, 2018 are used. the total number of reports are 5,600, and 3,074 reports, which accounts about 55% of the total, is designated as a training set, and other 45% of reports are designated as a testing set. Before constructing the model, all reports of a training set are classified by stocks, and their entities are extracted using named entity recognition tool which is the KKMA. for each stocks, top 100 entities based on appearance frequency are selected, and become vectorized using one-hot encoding. After that, by using neural tensor network, the same number of score functions as stocks are trained. Thus, if a new entity from a testing set appears, we can try to calculate the score by putting it into every single score function, and the stock of the function with the highest score is predicted as the related item with the entity. To evaluate presented models, we confirm prediction power and determining whether the score functions are well constructed by calculating hit ratio for all reports of testing set. As a result of the empirical study, the presented model shows 69.3% hit accuracy for testing set which consists of 2,526 reports. this hit ratio is meaningfully high despite of some constraints for conducting research. Looking at the prediction performance of the model for each stocks, only 3 stocks, which are LG ELECTRONICS, KiaMtr, and Mando, show extremely low performance than average. this result maybe due to the interference effect with other similar items and generation of new knowledge. In this paper, we propose a methodology to find out key entities or their combinations which are necessary to search related information in accordance with the user's investment intention. Graph data is generated by using only the named entity recognition tool and applied to the neural tensor network without learning corpus or word vectors for the field. From the empirical test, we confirm the effectiveness of the presented model as described above. However, there also exist some limits and things to complement. Representatively, the phenomenon that the model performance is especially bad for only some stocks shows the need for further researches. Finally, through the empirical study, we confirmed that the learning method presented in this study can be used for the purpose of matching the new text information semantically with the related stocks.

A Study on the Effect of Network Centralities on Recommendation Performance (네트워크 중심성 척도가 추천 성능에 미치는 영향에 대한 연구)

  • Lee, Dongwon
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.1
    • /
    • pp.23-46
    • /
    • 2021
  • Collaborative filtering, which is often used in personalization recommendations, is recognized as a very useful technique to find similar customers and recommend products to them based on their purchase history. However, the traditional collaborative filtering technique has raised the question of having difficulty calculating the similarity for new customers or products due to the method of calculating similaritiesbased on direct connections and common features among customers. For this reason, a hybrid technique was designed to use content-based filtering techniques together. On the one hand, efforts have been made to solve these problems by applying the structural characteristics of social networks. This applies a method of indirectly calculating similarities through their similar customers placed between them. This means creating a customer's network based on purchasing data and calculating the similarity between the two based on the features of the network that indirectly connects the two customers within this network. Such similarity can be used as a measure to predict whether the target customer accepts recommendations. The centrality metrics of networks can be utilized for the calculation of these similarities. Different centrality metrics have important implications in that they may have different effects on recommended performance. In this study, furthermore, the effect of these centrality metrics on the performance of recommendation may vary depending on recommender algorithms. In addition, recommendation techniques using network analysis can be expected to contribute to increasing recommendation performance even if they apply not only to new customers or products but also to entire customers or products. By considering a customer's purchase of an item as a link generated between the customer and the item on the network, the prediction of user acceptance of recommendation is solved as a prediction of whether a new link will be created between them. As the classification models fit the purpose of solving the binary problem of whether the link is engaged or not, decision tree, k-nearest neighbors (KNN), logistic regression, artificial neural network, and support vector machine (SVM) are selected in the research. The data for performance evaluation used order data collected from an online shopping mall over four years and two months. Among them, the previous three years and eight months constitute social networks composed of and the experiment was conducted by organizing the data collected into the social network. The next four months' records were used to train and evaluate recommender models. Experiments with the centrality metrics applied to each model show that the recommendation acceptance rates of the centrality metrics are different for each algorithm at a meaningful level. In this work, we analyzed only four commonly used centrality metrics: degree centrality, betweenness centrality, closeness centrality, and eigenvector centrality. Eigenvector centrality records the lowest performance in all models except support vector machines. Closeness centrality and betweenness centrality show similar performance across all models. Degree centrality ranking moderate across overall models while betweenness centrality always ranking higher than degree centrality. Finally, closeness centrality is characterized by distinct differences in performance according to the model. It ranks first in logistic regression, artificial neural network, and decision tree withnumerically high performance. However, it only records very low rankings in support vector machine and K-neighborhood with low-performance levels. As the experiment results reveal, in a classification model, network centrality metrics over a subnetwork that connects the two nodes can effectively predict the connectivity between two nodes in a social network. Furthermore, each metric has a different performance depending on the classification model type. This result implies that choosing appropriate metrics for each algorithm can lead to achieving higher recommendation performance. In general, betweenness centrality can guarantee a high level of performance in any model. It would be possible to consider the introduction of proximity centrality to obtain higher performance for certain models.