• Title/Summary/Keyword: common neighbors

Search Result 69, Processing Time 0.025 seconds

Resolving the 'Gray sheep' Problem Using Social Network Analysis (SNA) in Collaborative Filtering (CF) Recommender Systems (소셜 네트워크 분석 기법을 활용한 협업필터링의 특이취향 사용자(Gray Sheep) 문제 해결)

  • Kim, Minsung;Im, Il
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.137-148
    • /
    • 2014
  • Recommender system has become one of the most important technologies in e-commerce in these days. The ultimate reason to shop online, for many consumers, is to reduce the efforts for information search and purchase. Recommender system is a key technology to serve these needs. Many of the past studies about recommender systems have been devoted to developing and improving recommendation algorithms and collaborative filtering (CF) is known to be the most successful one. Despite its success, however, CF has several shortcomings such as cold-start, sparsity, gray sheep problems. In order to be able to generate recommendations, ordinary CF algorithms require evaluations or preference information directly from users. For new users who do not have any evaluations or preference information, therefore, CF cannot come up with recommendations (Cold-star problem). As the numbers of products and customers increase, the scale of the data increases exponentially and most of the data cells are empty. This sparse dataset makes computation for recommendation extremely hard (Sparsity problem). Since CF is based on the assumption that there are groups of users sharing common preferences or tastes, CF becomes inaccurate if there are many users with rare and unique tastes (Gray sheep problem). This study proposes a new algorithm that utilizes Social Network Analysis (SNA) techniques to resolve the gray sheep problem. We utilize 'degree centrality' in SNA to identify users with unique preferences (gray sheep). Degree centrality in SNA refers to the number of direct links to and from a node. In a network of users who are connected through common preferences or tastes, those with unique tastes have fewer links to other users (nodes) and they are isolated from other users. Therefore, gray sheep can be identified by calculating degree centrality of each node. We divide the dataset into two, gray sheep and others, based on the degree centrality of the users. Then, different similarity measures and recommendation methods are applied to these two datasets. More detail algorithm is as follows: Step 1: Convert the initial data which is a two-mode network (user to item) into an one-mode network (user to user). Step 2: Calculate degree centrality of each node and separate those nodes having degree centrality values lower than the pre-set threshold. The threshold value is determined by simulations such that the accuracy of CF for the remaining dataset is maximized. Step 3: Ordinary CF algorithm is applied to the remaining dataset. Step 4: Since the separated dataset consist of users with unique tastes, an ordinary CF algorithm cannot generate recommendations for them. A 'popular item' method is used to generate recommendations for these users. The F measures of the two datasets are weighted by the numbers of nodes and summed to be used as the final performance metric. In order to test performance improvement by this new algorithm, an empirical study was conducted using a publically available dataset - the MovieLens data by GroupLens research team. We used 100,000 evaluations by 943 users on 1,682 movies. The proposed algorithm was compared with an ordinary CF algorithm utilizing 'Best-N-neighbors' and 'Cosine' similarity method. The empirical results show that F measure was improved about 11% on average when the proposed algorithm was used

    . Past studies to improve CF performance typically used additional information other than users' evaluations such as demographic data. Some studies applied SNA techniques as a new similarity metric. This study is novel in that it used SNA to separate dataset. This study shows that performance of CF can be improved, without any additional information, when SNA techniques are used as proposed. This study has several theoretical and practical implications. This study empirically shows that the characteristics of dataset can affect the performance of CF recommender systems. This helps researchers understand factors affecting performance of CF. This study also opens a door for future studies in the area of applying SNA to CF to analyze characteristics of dataset. In practice, this study provides guidelines to improve performance of CF recommender systems with a simple modification.

  • Examination of Urban Gardening as an Everydayness in Urban Residential Area, Haebangchon (도심주거지에 나타나는 일상문화로서의 도시정원가꾸기에 대한 고찰 - 용산구 용산동2가 해방촌을 중심으로 -)

    • Sim, Joo-Young;Zoh, Kyung-Jin
      • Journal of the Korean Institute of Landscape Architecture
      • /
      • v.43 no.2
      • /
      • pp.1-12
      • /
      • 2015
    • This study explores urban gardening and garden culture in residential area as an everydayness that has been overlooked during the modern period urbanization and investigates the meaning and value of urban gardening from the perspective of urban formations and growth in spontaneous urban residential area, Haebangchon. The result identified that urban gardening as a meaning of contemporary culture is a new clue to improving the urban physical environment and changing the lives and community network of residents. Haebangchon is one of the few remaining spontaneous habitations in Seoul, and was created as a temporary unlicensed shantytown in 1940s. It became the representative habitation for common people in downtown Seoul through the revitalization of the 60s and the local reform through self-sustaining redevelopment projects during the 70s through the 90s. This area still contains the image of times during the 50s to the 60s, the 70s to the 80s and present, with the percentage of long-term stay residents high. Within this context, the site is divided into third quarters, and the research undertaken by observation and investigation to determine characteristics of urban gardening as an everydayness. It can be said that urban gardening and garden culture in Haebangchon is a unique location culture that has accumulated in the crevices of the physical condition and culture of life. These places are an expression of resident's desires that seeking out nature and gardening as revealed in densely-populated areas and the grounds of practical acting and participating in care and cultivation. It forms a unique, indigenous local landscape as an accumulation of everyday life of residents. Urban gardens in detached home has retained the original function of the dwelling and the garden, or 'madang', and takes on the characteristic of public space through the sharing of a public nature as well as semi-private spatial characteristic. Also, urban gardens including small kitchen garden and flowerpots that appear in the narrow streets provide pleasure as a part of nature that blossoms in narrow alley and functions as a public garden for exchanging with neighbors by sharing produce. This paper provides the concept of redefining the relationship between the private-public area that occurs between outside spaces that are cut off in a modern city.

    A Study on Analysis and Problems of Deliberation to Change the Present Condition Around Cultural Properties - Focusing on the Cultural Properties of Standard Establishment for Change the Present Condition - (문화재 주변 현상변경허가 신청안 분석 및 문제점에 관한 연구 - 현상변경허용기준 수립 문화재를 중심으로 -)

    • Kim, Dong-Chan;Lim, Jin-Kang
      • Journal of the Korean Institute of Traditional Landscape Architecture
      • /
      • v.30 no.1
      • /
      • pp.22-30
      • /
      • 2012
    • The purpose of this study of deliberation results to Change the Present Condition of Gyeonggi-do Designated Cultural Properties established Acceptable Standards of Change the Present and its disposal of the fire, Accordingly to minimize the complaints of neighbors cultural property and Decision of Change the Present Condition to maintain the consistency of that is the purpose of this study and The results of the study are as follows First, 82 of proposals with tangible cultural monuments and 56% and 50% each accounted for the majority. 84% of the individual applicant, the application uses 94% of the construction, application facility houses the highest percentage of 40%. Approved and reconsideration results, while the largest number of monuments, rejected the result was the most frequent types of cultural property. and Building height, the highest was filed. Second, the results of deliberation voted two of the most frequently voted 36.5% to 48.7%, 14.6% is the reconsideration. The main reason for the decision passed Cultural minimal impact surrounding landscape, and the application of the buildings surrounding the site have voted in determining the existence of many affected. The main reason for rejection of the decision and voted to determine the cultural assets, compared to inhibition surrounding area of application architecture is characterized by large-scale. The main reason for reconsideration of the decision after a site survey was the most frequent reconsideration. Third, the primary consideration only if you applied a full half of 82 is 47.5% of cases. Consequences of the two manatgo voted the same, assuming the number of agenda items, the more the decision was rejected, and reconsideration. Fourth, the application area committee areas and cultural phenomenon in the conservation area be applied to each 33.7% and 20.2% was most common. Compared with rejection and reconsideration if the various sections of the application has passed. and When applied to the conservation area there were passed. Cultural committee rejected the decision, if applied to areas were most prevalent, reconsideration of the decision was similar in the two areas. Fifth, the size of the construction of buildings collapsed as a result, you voted in rejection have changed. and Voted to reject the results from the increase in building area when changing a lot of decisions have affected the results.

    A Study on Improvement of Collaborative Filtering Based on Implicit User Feedback Using RFM Multidimensional Analysis (RFM 다차원 분석 기법을 활용한 암시적 사용자 피드백 기반 협업 필터링 개선 연구)

    • Lee, Jae-Seong;Kim, Jaeyoung;Kang, Byeongwook
      • Journal of Intelligence and Information Systems
      • /
      • v.25 no.1
      • /
      • pp.139-161
      • /
      • 2019
    • The utilization of the e-commerce market has become a common life style in today. It has become important part to know where and how to make reasonable purchases of good quality products for customers. This change in purchase psychology tends to make it difficult for customers to make purchasing decisions in vast amounts of information. In this case, the recommendation system has the effect of reducing the cost of information retrieval and improving the satisfaction by analyzing the purchasing behavior of the customer. Amazon and Netflix are considered to be the well-known examples of sales marketing using the recommendation system. In the case of Amazon, 60% of the recommendation is made by purchasing goods, and 35% of the sales increase was achieved. Netflix, on the other hand, found that 75% of movie recommendations were made using services. This personalization technique is considered to be one of the key strategies for one-to-one marketing that can be useful in online markets where salespeople do not exist. Recommendation techniques that are mainly used in recommendation systems today include collaborative filtering and content-based filtering. Furthermore, hybrid techniques and association rules that use these techniques in combination are also being used in various fields. Of these, collaborative filtering recommendation techniques are the most popular today. Collaborative filtering is a method of recommending products preferred by neighbors who have similar preferences or purchasing behavior, based on the assumption that users who have exhibited similar tendencies in purchasing or evaluating products in the past will have a similar tendency to other products. However, most of the existed systems are recommended only within the same category of products such as books and movies. This is because the recommendation system estimates the purchase satisfaction about new item which have never been bought yet using customer's purchase rating points of a similar commodity based on the transaction data. In addition, there is a problem about the reliability of purchase ratings used in the recommendation system. Reliability of customer purchase ratings is causing serious problems. In particular, 'Compensatory Review' refers to the intentional manipulation of a customer purchase rating by a company intervention. In fact, Amazon has been hard-pressed for these "compassionate reviews" since 2016 and has worked hard to reduce false information and increase credibility. The survey showed that the average rating for products with 'Compensated Review' was higher than those without 'Compensation Review'. And it turns out that 'Compensatory Review' is about 12 times less likely to give the lowest rating, and about 4 times less likely to leave a critical opinion. As such, customer purchase ratings are full of various noises. This problem is directly related to the performance of recommendation systems aimed at maximizing profits by attracting highly satisfied customers in most e-commerce transactions. In this study, we propose the possibility of using new indicators that can objectively substitute existing customer 's purchase ratings by using RFM multi-dimensional analysis technique to solve a series of problems. RFM multi-dimensional analysis technique is the most widely used analytical method in customer relationship management marketing(CRM), and is a data analysis method for selecting customers who are likely to purchase goods. As a result of verifying the actual purchase history data using the relevant index, the accuracy was as high as about 55%. This is a result of recommending a total of 4,386 different types of products that have never been bought before, thus the verification result means relatively high accuracy and utilization value. And this study suggests the possibility of general recommendation system that can be applied to various offline product data. If additional data is acquired in the future, the accuracy of the proposed recommendation system can be improved.

    The Practice of 'Liberated-ness': An Education Model for Protestant Spiritual Practice (개신교 '자유케 됨'의 영성에 기초한 기독교 영성교육 모형: '자유케 됨'의 실천)

    • Hwang, In-Hae
      • Journal of Christian Education in Korea
      • /
      • v.68
      • /
      • pp.375-415
      • /
      • 2021
    • Although the interest in Christian education of spirituality has increased recently, the practice of the education of spirituality in the Korean Church has been fragmented in the contents and methods without any clear educational purpose of the Protestant tradition. This requires a creative study to seek out the contents and method best suited to realizing the educational purpose of the Protestant tradition, through a rigorous academic methodology. This study proposes just such a creative model for the education of spirituality with an educational purpose based on the core ethos of the Protestant spirituality, integrating the long tradition of spiritual practices of Christianity. First, I survey the teachings on 'the life of faith' of the main leaders of the Protestant church, including Martin Luther, John Calvin, and John Wesley. Through this process, I reveal 'liberated-ness' to be the common purpose of the Protestant leaders, and the core of the practices for that purpose are 'the means of grace,' which has a different meaning from that of the Roman Catholic tradition. I construct the meaning of 'liberated-ness' in a dynamic manner, which begins with the 'liberating will' of God, and is followed by the 'self-giving will' of the believer as the response to the 'grace' of the 'liberating will.' The contact point of these two 'wills' is what I call 'the living membrane of faith.' As a creative synthesis of the above discussions, I propose a model of 'the practice of liberated-ness' for an education in spiritual practice. The purpose of this education is for the learner to become a person who continuously experiences ever-increasing 'liberated-ness' through continuous personal 'encounters' with God, and to become ever more faithful in carrying out practices for the 'liberated-ness' of her or his neighbors. The relationship between the teacher and the learner is that of personal 'encounter' as put forth by Sherrill, and also incorporates elements of 'co-authorship' as conceptualized by Kim. I transform and rename major practices of spiritual discipline according to a principle of 'liberated-ness' based on the Protestant tradition, and these comprise the main content of my spirituality education model. They include: 'lectio divina of encounter,' 'prayer facing the Lord,' 'service in liberation,' 'reflection of liberated-ness,' and 'mutual spiritual direction.' The teaching and learning process draws on Dykstra's methods of coaching and mentoring. The key environment is that of a 'sacramental community' as defined by Moore. Evaluation can be performed only by the learner her/himself. The significance of this model is that it creatively inherits and succeeds the tradition of Christian spiritual discipline from the early church onwards by transforming it through a Protestant spirituality of 'liberated-ness.'

    One-probe P300 based concealed information test with machine learning (기계학습을 이용한 단일 관련자극 P300기반 숨김정보검사)

    • Hyuk Kim;Hyun-Taek Kim
      • Korean Journal of Cognitive Science
      • /
      • v.35 no.1
      • /
      • pp.49-95
      • /
      • 2024
    • Polygraph examination, statement validity analysis and P300-based concealed information test are major three examination tools, which are use to determine a person's truthfulness and credibility in criminal procedure. Although polygraph examination is most common in criminal procedure, but it has little admissibility of evidence due to the weakness of scientific basis. In 1990s to support the weakness of scientific basis about polygraph, Farwell and Donchin proposed the P300-based concealed information test technique. The P300-based concealed information test has two strong points. First, the P300-based concealed information test is easy to conduct with polygraph. Second, the P300-based concealed information test has plentiful scientific basis. Nevertheless, the utilization of P300-based concealed information test is infrequent, because of the quantity of probe stimulus. The probe stimulus contains closed information that is relevant to the crime or other investigated situation. In tradition P300-based concealed information test protocol, three or more probe stimuli are necessarily needed. But it is hard to acquire three or more probe stimuli, because most of the crime relevant information is opened in investigative situation. In addition, P300-based concealed information test uses oddball paradigm, and oddball paradigm makes imbalance between the number of probe and irrelevant stimulus. Thus, there is a possibility that the unbalanced number of probe and irrelevant stimulus caused systematic underestimation of P300 amplitude of irrelevant stimuli. To overcome the these two limitation of P300-based concealed information test, one-probe P300-based concealed information test protocol is explored with various machine learning algorithms. According to this study, parameters of the modified one-probe protocol are as follows. In the condition of female and male face stimuli, the duration of stimuli are encouraged 400ms, the repetition of stimuli are encouraged 60 times, the analysis method of P300 amplitude is encouraged peak to peak method, the cut-off of guilty condition is encouraged 90% and the cut-off of innocent condition is encouraged 30%. In the condition of two-syllable word stimulus, the duration of stimulus is encouraged 300ms, the repetition of stimulus is encouraged 60 times, the analysis method of P300 amplitude is encouraged peak to peak method, the cut-off of guilty condition is encouraged 90% and the cut-off of innocent condition is encouraged 30%. It was also conformed that the logistic regression (LR), linear discriminant analysis (LDA), K Neighbors (KNN) algorithms were probable methods for analysis of P300 amplitude. The one-probe P300-based concealed information test with machine learning protocol is helpful to increase utilization of P300-based concealed information test, and supports to determine a person's truthfulness and credibility with the polygraph examination in criminal procedure.

    A Grounded theory Approach on the Experience of Sexual Abuse Victims (성폭력 피해여성의 경험에 관한 연구)

    • Kim, Kyung-Hee;Nam, Sun-Young;Chee, Soon-Ju;Kwon, Hye-Jin;Chung, Yeon-Kang
      • Journal of the Korean Society of School Health
      • /
      • v.9 no.1
      • /
      • pp.77-98
      • /
      • 1996
    • This studies designed to work out a theoretical framework on the experience of sexual abuse from the perspective of grounded theory in an effort to provide more practical and efficient nursing intervention for female victims. The subcategories identified were "sexual abuse", "threatening", "absent mindness", "embarrassment", "horripilation", "dizziness", "wondrousness", "filthiness", "sexual curiosity", "violence level", "victim's age", "neighbors response", "victims personality", "common experience", "sexual abuse information", "family relations", "level of familiarity", "hiding", "suppression", "self-torture", "self-protection", "avoidance", "asking aid", "withdrawal", "hatred", "confusion", "dodging, "remmant", and "pursuing". The 29 subcategories given above were further integrated into 16 categories such as "victimizedness", "being astounded", "filthiness", "degree", "developmental stage", "response pattern", "personality", "rarity", "information availability", "family support", "cover-up", "escaping", "informing", "negative internalization", and "positive pursuit of change". The core categories linked to all the other categories turned out to be "being taken aback" and "filthiness" incorporating the relevant subcategories. A total of 23 theoretical hypothesis emerged in the process of analyzing data. 1. the grater sexual curiosity, the weaker the senses of being taken aback and filthiness. 2. The weaker sexual curiosity, the stronger the senses of being taken aback and filthiness. 3. The stronger the level of violence, The more violent the senses of being taken aback and filthiness. 4. The lower the level of violence, the weaker the senses of being taken aback and filthiness. 5. The younger the victims, the stronger the senses of being taken aback and filthiness. 6. The older the victims, The weaker the senses of being taken aback and filthiness. 7. 'Escaping' will transpire regardless of the given circumstances. 8. The weaker the senses of being taken aback and filthiness, the more probable 'informing' and 'escaping' transpire. 9. The stronger the senses of being taken aback and filthiness, the more probable 'informing' and 'escaping' transpire. 10. The more protective the response from 'informing' and 'escaping' transpire around, the more likely the response to being taken aback' and 'filthiness' will be 'informing' and 'escaping'. 11. The more repelling the response from around, the more likely the response to 'being taken aback' and 'filthiness' will be 'covering-up' and 'escaping'. 12. The more open minded the personality of the subject, the more likely the response to 'being taken aback' and 'filthiness' will be 'informing' and 'escaping'. 13. The more closed the personality of tile subject, the more likely the response to 'being taken aback' and 'filthiness' will be 'covering-up' and 'escaping'. 14. The more frequent the experience of sexual abuse, the more likely the response to 'being taken aback' and 'filthiness' will be 'informing' and 'escaping'. 15. The less frequent the experience of sexual abuse, the more lilely the response to 'being taken aback' and 'filthiness' will be 'covering-up' and 'escaping'. 16. The more available information concerning sexual abuses, the more likely response to 'being taken aback' and 'filthiness' will be 'informing' and 'escaping. 17. The less available information concerning sexual abuses, the more likely the response to 'being taken aback' and 'filthiness' will be 'covering-up' and 'escaping'. 18. The more cohesive the family of the subject, the more likely the response to 'being taken aback' and 'filthiness' will be 'informing' and 'escaping'. 19. The less cohesive the family of the subject, the more likely the response to 'being taken aback' and 'filthiness' will be 'covering-up' and 'escaping'. 20. The less familiar the subject is with the abuser, the more likely the response to 'being taken aback' and 'filthiness' will be 'informing' and 'escaping'. 21. The less familiar the subject is with the abuser, the more likely the response to 'being taken aback' and 'filthiness' will be 'covering-up' and 'escaping. 22. The more likely the response to 'being taken aback' and 'filthiness' is 'informing and 'escaping', the more positive changes the subject will pursue. 23. The more likely the response to 'being taken aback' and 'filthiness' is 'covering-up' and 'escaping', the more negative changes the subject will pursue. The following four hypotheses were conformed in the process of data analysis. 1) In case the level of violence is strong but 'being taken aback' and 'filthiness' in weak because of strong sexual curiosity and also if information concerning sexual abuse is not readily available and the frequency is low, negative internationalization marked by 'covering-up' and 'escaping' will take place despite the fact the subject is open-minded, the family is cohesive and the abuser is unfamiliar. 2) In case the level of violence is weak but 'being taken aback' and 'filthiness' is weak combined with weak sexual curiosity and also if information concerning sexual abuse is readily available and the response from around is protective and the frequency is high, the subject will pursue positive changes to 'being taken aback' and 'filthiness', further aided by the fact that the subject is open-minded, the family is cohesive and the abuser is unfamiliar. 3) In case the level of violence is strong and 'being taken abuse' and 'filthiness' is strong because of weak sexual curiosity and also if information concerning sexual abuse is reading available and the response from around is readily available and the response from around is protective and the frequency is low, the subject will persue positive changes marked by 'informing' and 'escaping' despite the fact that the family cohesion is weak and the abuser is familiar. 4) In case the level of violence is strong and 'being taken aback' and 'filthiness' is strong because of weak sexual curiosity and also if information concerning sexual abuse is not readily available and the response from around is respelling and the frequency is low negative internalization like 'covering-up' and 'escaping' will take place, further aggravated by the fact that the subject's personality is closed, family cohesion is weak, and subject is familiar. On the basis of the above finding, it is recommended that nursing intervention should focus on promoting the milieu conductive to the victims pursuing positive changes along with the adequate aids from protection facilities as well as from the people around them.

    • PDF

    A Study on the Effect of Network Centralities on Recommendation Performance (네트워크 중심성 척도가 추천 성능에 미치는 영향에 대한 연구)

    • Lee, Dongwon
      • Journal of Intelligence and Information Systems
      • /
      • v.27 no.1
      • /
      • pp.23-46
      • /
      • 2021
    • Collaborative filtering, which is often used in personalization recommendations, is recognized as a very useful technique to find similar customers and recommend products to them based on their purchase history. However, the traditional collaborative filtering technique has raised the question of having difficulty calculating the similarity for new customers or products due to the method of calculating similaritiesbased on direct connections and common features among customers. For this reason, a hybrid technique was designed to use content-based filtering techniques together. On the one hand, efforts have been made to solve these problems by applying the structural characteristics of social networks. This applies a method of indirectly calculating similarities through their similar customers placed between them. This means creating a customer's network based on purchasing data and calculating the similarity between the two based on the features of the network that indirectly connects the two customers within this network. Such similarity can be used as a measure to predict whether the target customer accepts recommendations. The centrality metrics of networks can be utilized for the calculation of these similarities. Different centrality metrics have important implications in that they may have different effects on recommended performance. In this study, furthermore, the effect of these centrality metrics on the performance of recommendation may vary depending on recommender algorithms. In addition, recommendation techniques using network analysis can be expected to contribute to increasing recommendation performance even if they apply not only to new customers or products but also to entire customers or products. By considering a customer's purchase of an item as a link generated between the customer and the item on the network, the prediction of user acceptance of recommendation is solved as a prediction of whether a new link will be created between them. As the classification models fit the purpose of solving the binary problem of whether the link is engaged or not, decision tree, k-nearest neighbors (KNN), logistic regression, artificial neural network, and support vector machine (SVM) are selected in the research. The data for performance evaluation used order data collected from an online shopping mall over four years and two months. Among them, the previous three years and eight months constitute social networks composed of and the experiment was conducted by organizing the data collected into the social network. The next four months' records were used to train and evaluate recommender models. Experiments with the centrality metrics applied to each model show that the recommendation acceptance rates of the centrality metrics are different for each algorithm at a meaningful level. In this work, we analyzed only four commonly used centrality metrics: degree centrality, betweenness centrality, closeness centrality, and eigenvector centrality. Eigenvector centrality records the lowest performance in all models except support vector machines. Closeness centrality and betweenness centrality show similar performance across all models. Degree centrality ranking moderate across overall models while betweenness centrality always ranking higher than degree centrality. Finally, closeness centrality is characterized by distinct differences in performance according to the model. It ranks first in logistic regression, artificial neural network, and decision tree withnumerically high performance. However, it only records very low rankings in support vector machine and K-neighborhood with low-performance levels. As the experiment results reveal, in a classification model, network centrality metrics over a subnetwork that connects the two nodes can effectively predict the connectivity between two nodes in a social network. Furthermore, each metric has a different performance depending on the classification model type. This result implies that choosing appropriate metrics for each algorithm can lead to achieving higher recommendation performance. In general, betweenness centrality can guarantee a high level of performance in any model. It would be possible to consider the introduction of proximity centrality to obtain higher performance for certain models.

    A Study on Searching for Export Candidate Countries of the Korean Food and Beverage Industry Using Node2vec Graph Embedding and Light GBM Link Prediction (Node2vec 그래프 임베딩과 Light GBM 링크 예측을 활용한 식음료 산업의 수출 후보국가 탐색 연구)

    • Lee, Jae-Seong;Jun, Seung-Pyo;Seo, Jinny
      • Journal of Intelligence and Information Systems
      • /
      • v.27 no.4
      • /
      • pp.73-95
      • /
      • 2021
    • This study uses Node2vec graph embedding method and Light GBM link prediction to explore undeveloped export candidate countries in Korea's food and beverage industry. Node2vec is the method that improves the limit of the structural equivalence representation of the network, which is known to be relatively weak compared to the existing link prediction method based on the number of common neighbors of the network. Therefore, the method is known to show excellent performance in both community detection and structural equivalence of the network. The vector value obtained by embedding the network in this way operates under the condition of a constant length from an arbitrarily designated starting point node. Therefore, it has the advantage that it is easy to apply the sequence of nodes as an input value to the model for downstream tasks such as Logistic Regression, Support Vector Machine, and Random Forest. Based on these features of the Node2vec graph embedding method, this study applied the above method to the international trade information of the Korean food and beverage industry. Through this, we intend to contribute to creating the effect of extensive margin diversification in Korea in the global value chain relationship of the industry. The optimal predictive model derived from the results of this study recorded a precision of 0.95 and a recall of 0.79, and an F1 score of 0.86, showing excellent performance. This performance was shown to be superior to that of the binary classifier based on Logistic Regression set as the baseline model. In the baseline model, a precision of 0.95 and a recall of 0.73 were recorded, and an F1 score of 0.83 was recorded. In addition, the light GBM-based optimal prediction model derived from this study showed superior performance than the link prediction model of previous studies, which is set as a benchmarking model in this study. The predictive model of the previous study recorded only a recall rate of 0.75, but the proposed model of this study showed better performance which recall rate is 0.79. The difference in the performance of the prediction results between benchmarking model and this study model is due to the model learning strategy. In this study, groups were classified by the trade value scale, and prediction models were trained differently for these groups. Specific methods are (1) a method of randomly masking and learning a model for all trades without setting specific conditions for trade value, (2) arbitrarily masking a part of the trades with an average trade value or higher and using the model method, and (3) a method of arbitrarily masking some of the trades with the top 25% or higher trade value and learning the model. As a result of the experiment, it was confirmed that the performance of the model trained by randomly masking some of the trades with the above-average trade value in this method was the best and appeared stably. It was found that most of the results of potential export candidates for Korea derived through the above model appeared appropriate through additional investigation. Combining the above, this study could suggest the practical utility of the link prediction method applying Node2vec and Light GBM. In addition, useful implications could be derived for weight update strategies that can perform better link prediction while training the model. On the other hand, this study also has policy utility because it is applied to trade transactions that have not been performed much in the research related to link prediction based on graph embedding. The results of this study support a rapid response to changes in the global value chain such as the recent US-China trade conflict or Japan's export regulations, and I think that it has sufficient usefulness as a tool for policy decision-making.


    (34141) Korea Institute of Science and Technology Information, 245, Daehak-ro, Yuseong-gu, Daejeon
    Copyright (C) KISTI. All Rights Reserved.