Resolving the 'Gray sheep' Problem Using Social Network Analysis (SNA) in Collaborative Filtering (CF) Recommender Systems (소셜 네트워크 분석 기법을 활용한 협업필터링의 특이취향 사용자(Gray Sheep) 문제 해결)
- Kim, Minsung;Im, Il
-
- Journal of Intelligence and Information Systems
- /
- v.20 no.2
- /
- pp.137-148
- /
- 2014
-
Recommender system has become one of the most important technologies in e-commerce in these days. The ultimate reason to shop online, for many consumers, is to reduce the efforts for information search and purchase. Recommender system is a key technology to serve these needs. Many of the past studies about recommender systems have been devoted to developing and improving recommendation algorithms and collaborative filtering (CF) is known to be the most successful one. Despite its success, however, CF has several shortcomings such as cold-start, sparsity, gray sheep problems. In order to be able to generate recommendations, ordinary CF algorithms require evaluations or preference information directly from users. For new users who do not have any evaluations or preference information, therefore, CF cannot come up with recommendations (Cold-star problem). As the numbers of products and customers increase, the scale of the data increases exponentially and most of the data cells are empty. This sparse dataset makes computation for recommendation extremely hard (Sparsity problem). Since CF is based on the assumption that there are groups of users sharing common preferences or tastes, CF becomes inaccurate if there are many users with rare and unique tastes (Gray sheep problem). This study proposes a new algorithm that utilizes Social Network Analysis (SNA) techniques to resolve the gray sheep problem. We utilize 'degree centrality' in SNA to identify users with unique preferences (gray sheep). Degree centrality in SNA refers to the number of direct links to and from a node. In a network of users who are connected through common preferences or tastes, those with unique tastes have fewer links to other users (nodes) and they are isolated from other users. Therefore, gray sheep can be identified by calculating degree centrality of each node. We divide the dataset into two, gray sheep and others, based on the degree centrality of the users. Then, different similarity measures and recommendation methods are applied to these two datasets. More detail algorithm is as follows: Step 1: Convert the initial data which is a two-mode network (user to item) into an one-mode network (user to user). Step 2: Calculate degree centrality of each node and separate those nodes having degree centrality values lower than the pre-set threshold. The threshold value is determined by simulations such that the accuracy of CF for the remaining dataset is maximized. Step 3: Ordinary CF algorithm is applied to the remaining dataset. Step 4: Since the separated dataset consist of users with unique tastes, an ordinary CF algorithm cannot generate recommendations for them. A 'popular item' method is used to generate recommendations for these users. The F measures of the two datasets are weighted by the numbers of nodes and summed to be used as the final performance metric. In order to test performance improvement by this new algorithm, an empirical study was conducted using a publically available dataset - the MovieLens data by GroupLens research team. We used 100,000 evaluations by 943 users on 1,682 movies. The proposed algorithm was compared with an ordinary CF algorithm utilizing 'Best-N-neighbors' and 'Cosine' similarity method. The empirical results show that F measure was improved about 11% on average when the proposed algorithm was used
. Past studies to improve CF performance typically used additional information other than users' evaluations such as demographic data. Some studies applied SNA techniques as a new similarity metric. This study is novel in that it used SNA to separate dataset. This study shows that performance of CF can be improved, without any additional information, when SNA techniques are used as proposed. This study has several theoretical and practical implications. This study empirically shows that the characteristics of dataset can affect the performance of CF recommender systems. This helps researchers understand factors affecting performance of CF. This study also opens a door for future studies in the area of applying SNA to CF to analyze characteristics of dataset. In practice, this study provides guidelines to improve performance of CF recommender systems with a simple modification.
Examination of Urban Gardening as an Everydayness in Urban Residential Area, Haebangchon (도심주거지에 나타나는 일상문화로서의 도시정원가꾸기에 대한 고찰 - 용산구 용산동2가 해방촌을 중심으로 -)
- Sim, Joo-Young;Zoh, Kyung-Jin
-
- Journal of the Korean Institute of Landscape Architecture
- /
- v.43 no.2
- /
- pp.1-12
- /
- 2015
-
This study explores urban gardening and garden culture in residential area as an everydayness that has been overlooked during the modern period urbanization and investigates the meaning and value of urban gardening from the perspective of urban formations and growth in spontaneous urban residential area, Haebangchon. The result identified that urban gardening as a meaning of contemporary culture is a new clue to improving the urban physical environment and changing the lives and community network of residents. Haebangchon is one of the few remaining spontaneous habitations in Seoul, and was created as a temporary unlicensed shantytown in 1940s. It became the representative habitation for common people in downtown Seoul through the revitalization of the 60s and the local reform through self-sustaining redevelopment projects during the 70s through the 90s. This area still contains the image of times during the 50s to the 60s, the 70s to the 80s and present, with the percentage of long-term stay residents high. Within this context, the site is divided into third quarters, and the research undertaken by observation and investigation to determine characteristics of urban gardening as an everydayness. It can be said that urban gardening and garden culture in Haebangchon is a unique location culture that has accumulated in the crevices of the physical condition and culture of life. These places are an expression of resident's desires that seeking out nature and gardening as revealed in densely-populated areas and the grounds of practical acting and participating in care and cultivation. It forms a unique, indigenous local landscape as an accumulation of everyday life of residents. Urban gardens in detached home has retained the original function of the dwelling and the garden, or 'madang', and takes on the characteristic of public space through the sharing of a public nature as well as semi-private spatial characteristic. Also, urban gardens including small kitchen garden and flowerpots that appear in the narrow streets provide pleasure as a part of nature that blossoms in narrow alley and functions as a public garden for exchanging with neighbors by sharing produce. This paper provides the concept of redefining the relationship between the private-public area that occurs between outside spaces that are cut off in a modern city.
A Study on Analysis and Problems of Deliberation to Change the Present Condition Around Cultural Properties - Focusing on the Cultural Properties of Standard Establishment for Change the Present Condition - (문화재 주변 현상변경허가 신청안 분석 및 문제점에 관한 연구 - 현상변경허용기준 수립 문화재를 중심으로 -)
- Kim, Dong-Chan;Lim, Jin-Kang
-
- Journal of the Korean Institute of Traditional Landscape Architecture
- /
- v.30 no.1
- /
- pp.22-30
- /
- 2012
-
The purpose of this study of deliberation results to Change the Present Condition of Gyeonggi-do Designated Cultural Properties established Acceptable Standards of Change the Present and its disposal of the fire, Accordingly to minimize the complaints of neighbors cultural property and Decision of Change the Present Condition to maintain the consistency of that is the purpose of this study and The results of the study are as follows First, 82 of proposals with tangible cultural monuments and 56% and 50% each accounted for the majority. 84% of the individual applicant, the application uses 94% of the construction, application facility houses the highest percentage of 40%. Approved and reconsideration results, while the largest number of monuments, rejected the result was the most frequent types of cultural property. and Building height, the highest was filed. Second, the results of deliberation voted two of the most frequently voted 36.5% to 48.7%, 14.6% is the reconsideration. The main reason for the decision passed Cultural minimal impact surrounding landscape, and the application of the buildings surrounding the site have voted in determining the existence of many affected. The main reason for rejection of the decision and voted to determine the cultural assets, compared to inhibition surrounding area of application architecture is characterized by large-scale. The main reason for reconsideration of the decision after a site survey was the most frequent reconsideration. Third, the primary consideration only if you applied a full half of 82 is 47.5% of cases. Consequences of the two manatgo voted the same, assuming the number of agenda items, the more the decision was rejected, and reconsideration. Fourth, the application area committee areas and cultural phenomenon in the conservation area be applied to each 33.7% and 20.2% was most common. Compared with rejection and reconsideration if the various sections of the application has passed. and When applied to the conservation area there were passed. Cultural committee rejected the decision, if applied to areas were most prevalent, reconsideration of the decision was similar in the two areas. Fifth, the size of the construction of buildings collapsed as a result, you voted in rejection have changed. and Voted to reject the results from the increase in building area when changing a lot of decisions have affected the results.
A Study on Improvement of Collaborative Filtering Based on Implicit User Feedback Using RFM Multidimensional Analysis (RFM 다차원 분석 기법을 활용한 암시적 사용자 피드백 기반 협업 필터링 개선 연구)
- Lee, Jae-Seong;Kim, Jaeyoung;Kang, Byeongwook
-
- Journal of Intelligence and Information Systems
- /
- v.25 no.1
- /
- pp.139-161
- /
- 2019
-
The utilization of the e-commerce market has become a common life style in today. It has become important part to know where and how to make reasonable purchases of good quality products for customers. This change in purchase psychology tends to make it difficult for customers to make purchasing decisions in vast amounts of information. In this case, the recommendation system has the effect of reducing the cost of information retrieval and improving the satisfaction by analyzing the purchasing behavior of the customer. Amazon and Netflix are considered to be the well-known examples of sales marketing using the recommendation system. In the case of Amazon, 60% of the recommendation is made by purchasing goods, and 35% of the sales increase was achieved. Netflix, on the other hand, found that 75% of movie recommendations were made using services. This personalization technique is considered to be one of the key strategies for one-to-one marketing that can be useful in online markets where salespeople do not exist. Recommendation techniques that are mainly used in recommendation systems today include collaborative filtering and content-based filtering. Furthermore, hybrid techniques and association rules that use these techniques in combination are also being used in various fields. Of these, collaborative filtering recommendation techniques are the most popular today. Collaborative filtering is a method of recommending products preferred by neighbors who have similar preferences or purchasing behavior, based on the assumption that users who have exhibited similar tendencies in purchasing or evaluating products in the past will have a similar tendency to other products. However, most of the existed systems are recommended only within the same category of products such as books and movies. This is because the recommendation system estimates the purchase satisfaction about new item which have never been bought yet using customer's purchase rating points of a similar commodity based on the transaction data. In addition, there is a problem about the reliability of purchase ratings used in the recommendation system. Reliability of customer purchase ratings is causing serious problems. In particular, 'Compensatory Review' refers to the intentional manipulation of a customer purchase rating by a company intervention. In fact, Amazon has been hard-pressed for these "compassionate reviews" since 2016 and has worked hard to reduce false information and increase credibility. The survey showed that the average rating for products with 'Compensated Review' was higher than those without 'Compensation Review'. And it turns out that 'Compensatory Review' is about 12 times less likely to give the lowest rating, and about 4 times less likely to leave a critical opinion. As such, customer purchase ratings are full of various noises. This problem is directly related to the performance of recommendation systems aimed at maximizing profits by attracting highly satisfied customers in most e-commerce transactions. In this study, we propose the possibility of using new indicators that can objectively substitute existing customer 's purchase ratings by using RFM multi-dimensional analysis technique to solve a series of problems. RFM multi-dimensional analysis technique is the most widely used analytical method in customer relationship management marketing(CRM), and is a data analysis method for selecting customers who are likely to purchase goods. As a result of verifying the actual purchase history data using the relevant index, the accuracy was as high as about 55%. This is a result of recommending a total of 4,386 different types of products that have never been bought before, thus the verification result means relatively high accuracy and utilization value. And this study suggests the possibility of general recommendation system that can be applied to various offline product data. If additional data is acquired in the future, the accuracy of the proposed recommendation system can be improved.
The Practice of 'Liberated-ness': An Education Model for Protestant Spiritual Practice (개신교 '자유케 됨'의 영성에 기초한 기독교 영성교육 모형: '자유케 됨'의 실천)
- Hwang, In-Hae
-
- Journal of Christian Education in Korea
- /
- v.68
- /
- pp.375-415
- /
- 2021
-
Although the interest in Christian education of spirituality has increased recently, the practice of the education of spirituality in the Korean Church has been fragmented in the contents and methods without any clear educational purpose of the Protestant tradition. This requires a creative study to seek out the contents and method best suited to realizing the educational purpose of the Protestant tradition, through a rigorous academic methodology. This study proposes just such a creative model for the education of spirituality with an educational purpose based on the core ethos of the Protestant spirituality, integrating the long tradition of spiritual practices of Christianity. First, I survey the teachings on 'the life of faith' of the main leaders of the Protestant church, including Martin Luther, John Calvin, and John Wesley. Through this process, I reveal 'liberated-ness' to be the common purpose of the Protestant leaders, and the core of the practices for that purpose are 'the means of grace,' which has a different meaning from that of the Roman Catholic tradition. I construct the meaning of 'liberated-ness' in a dynamic manner, which begins with the 'liberating will' of God, and is followed by the 'self-giving will' of the believer as the response to the 'grace' of the 'liberating will.' The contact point of these two 'wills' is what I call 'the living membrane of faith.' As a creative synthesis of the above discussions, I propose a model of 'the practice of liberated-ness' for an education in spiritual practice. The purpose of this education is for the learner to become a person who continuously experiences ever-increasing 'liberated-ness' through continuous personal 'encounters' with God, and to become ever more faithful in carrying out practices for the 'liberated-ness' of her or his neighbors. The relationship between the teacher and the learner is that of personal 'encounter' as put forth by Sherrill, and also incorporates elements of 'co-authorship' as conceptualized by Kim. I transform and rename major practices of spiritual discipline according to a principle of 'liberated-ness' based on the Protestant tradition, and these comprise the main content of my spirituality education model. They include: 'lectio divina of encounter,' 'prayer facing the Lord,' 'service in liberation,' 'reflection of liberated-ness,' and 'mutual spiritual direction.' The teaching and learning process draws on Dykstra's methods of coaching and mentoring. The key environment is that of a 'sacramental community' as defined by Moore. Evaluation can be performed only by the learner her/himself. The significance of this model is that it creatively inherits and succeeds the tradition of Christian spiritual discipline from the early church onwards by transforming it through a Protestant spirituality of 'liberated-ness.'
One-probe P300 based concealed information test with machine learning (기계학습을 이용한 단일 관련자극 P300기반 숨김정보검사)
- Hyuk Kim;Hyun-Taek Kim
-
- Korean Journal of Cognitive Science
- /
- v.35 no.1
- /
- pp.49-95
- /
- 2024
-
Polygraph examination, statement validity analysis and P300-based concealed information test are major three examination tools, which are use to determine a person's truthfulness and credibility in criminal procedure. Although polygraph examination is most common in criminal procedure, but it has little admissibility of evidence due to the weakness of scientific basis. In 1990s to support the weakness of scientific basis about polygraph, Farwell and Donchin proposed the P300-based concealed information test technique. The P300-based concealed information test has two strong points. First, the P300-based concealed information test is easy to conduct with polygraph. Second, the P300-based concealed information test has plentiful scientific basis. Nevertheless, the utilization of P300-based concealed information test is infrequent, because of the quantity of probe stimulus. The probe stimulus contains closed information that is relevant to the crime or other investigated situation. In tradition P300-based concealed information test protocol, three or more probe stimuli are necessarily needed. But it is hard to acquire three or more probe stimuli, because most of the crime relevant information is opened in investigative situation. In addition, P300-based concealed information test uses oddball paradigm, and oddball paradigm makes imbalance between the number of probe and irrelevant stimulus. Thus, there is a possibility that the unbalanced number of probe and irrelevant stimulus caused systematic underestimation of P300 amplitude of irrelevant stimuli. To overcome the these two limitation of P300-based concealed information test, one-probe P300-based concealed information test protocol is explored with various machine learning algorithms. According to this study, parameters of the modified one-probe protocol are as follows. In the condition of female and male face stimuli, the duration of stimuli are encouraged 400ms, the repetition of stimuli are encouraged 60 times, the analysis method of P300 amplitude is encouraged peak to peak method, the cut-off of guilty condition is encouraged 90% and the cut-off of innocent condition is encouraged 30%. In the condition of two-syllable word stimulus, the duration of stimulus is encouraged 300ms, the repetition of stimulus is encouraged 60 times, the analysis method of P300 amplitude is encouraged peak to peak method, the cut-off of guilty condition is encouraged 90% and the cut-off of innocent condition is encouraged 30%. It was also conformed that the logistic regression (LR), linear discriminant analysis (LDA), K Neighbors (KNN) algorithms were probable methods for analysis of P300 amplitude. The one-probe P300-based concealed information test with machine learning protocol is helpful to increase utilization of P300-based concealed information test, and supports to determine a person's truthfulness and credibility with the polygraph examination in criminal procedure.
A Grounded theory Approach on the Experience of Sexual Abuse Victims (성폭력 피해여성의 경험에 관한 연구)
- Kim, Kyung-Hee;Nam, Sun-Young;Chee, Soon-Ju;Kwon, Hye-Jin;Chung, Yeon-Kang
-
- Journal of the Korean Society of School Health
- /
- v.9 no.1
- /
- pp.77-98
- /
- 1996
-
This studies designed to work out a theoretical framework on the experience of sexual abuse from the perspective of grounded theory in an effort to provide more practical and efficient nursing intervention for female victims. The subcategories identified were "sexual abuse", "threatening", "absent mindness", "embarrassment", "horripilation", "dizziness", "wondrousness", "filthiness", "sexual curiosity", "violence level", "victim's age", "neighbors response", "victims personality", "common experience", "sexual abuse information", "family relations", "level of familiarity", "hiding", "suppression", "self-torture", "self-protection", "avoidance", "asking aid", "withdrawal", "hatred", "confusion", "dodging, "remmant", and "pursuing". The 29 subcategories given above were further integrated into 16 categories such as "victimizedness", "being astounded", "filthiness", "degree", "developmental stage", "response pattern", "personality", "rarity", "information availability", "family support", "cover-up", "escaping", "informing", "negative internalization", and "positive pursuit of change". The core categories linked to all the other categories turned out to be "being taken aback" and "filthiness" incorporating the relevant subcategories. A total of 23 theoretical hypothesis emerged in the process of analyzing data. 1. the grater sexual curiosity, the weaker the senses of being taken aback and filthiness. 2. The weaker sexual curiosity, the stronger the senses of being taken aback and filthiness. 3. The stronger the level of violence, The more violent the senses of being taken aback and filthiness. 4. The lower the level of violence, the weaker the senses of being taken aback and filthiness. 5. The younger the victims, the stronger the senses of being taken aback and filthiness. 6. The older the victims, The weaker the senses of being taken aback and filthiness. 7. 'Escaping' will transpire regardless of the given circumstances. 8. The weaker the senses of being taken aback and filthiness, the more probable 'informing' and 'escaping' transpire. 9. The stronger the senses of being taken aback and filthiness, the more probable 'informing' and 'escaping' transpire. 10. The more protective the response from 'informing' and 'escaping' transpire around, the more likely the response to being taken aback' and 'filthiness' will be 'informing' and 'escaping'. 11. The more repelling the response from around, the more likely the response to 'being taken aback' and 'filthiness' will be 'covering-up' and 'escaping'. 12. The more open minded the personality of the subject, the more likely the response to 'being taken aback' and 'filthiness' will be 'informing' and 'escaping'. 13. The more closed the personality of tile subject, the more likely the response to 'being taken aback' and 'filthiness' will be 'covering-up' and 'escaping'. 14. The more frequent the experience of sexual abuse, the more likely the response to 'being taken aback' and 'filthiness' will be 'informing' and 'escaping'. 15. The less frequent the experience of sexual abuse, the more lilely the response to 'being taken aback' and 'filthiness' will be 'covering-up' and 'escaping'. 16. The more available information concerning sexual abuses, the more likely response to 'being taken aback' and 'filthiness' will be 'informing' and 'escaping. 17. The less available information concerning sexual abuses, the more likely the response to 'being taken aback' and 'filthiness' will be 'covering-up' and 'escaping'. 18. The more cohesive the family of the subject, the more likely the response to 'being taken aback' and 'filthiness' will be 'informing' and 'escaping'. 19. The less cohesive the family of the subject, the more likely the response to 'being taken aback' and 'filthiness' will be 'covering-up' and 'escaping'. 20. The less familiar the subject is with the abuser, the more likely the response to 'being taken aback' and 'filthiness' will be 'informing' and 'escaping'. 21. The less familiar the subject is with the abuser, the more likely the response to 'being taken aback' and 'filthiness' will be 'covering-up' and 'escaping. 22. The more likely the response to 'being taken aback' and 'filthiness' is 'informing and 'escaping', the more positive changes the subject will pursue. 23. The more likely the response to 'being taken aback' and 'filthiness' is 'covering-up' and 'escaping', the more negative changes the subject will pursue. The following four hypotheses were conformed in the process of data analysis. 1) In case the level of violence is strong but 'being taken aback' and 'filthiness' in weak because of strong sexual curiosity and also if information concerning sexual abuse is not readily available and the frequency is low, negative internationalization marked by 'covering-up' and 'escaping' will take place despite the fact the subject is open-minded, the family is cohesive and the abuser is unfamiliar. 2) In case the level of violence is weak but 'being taken aback' and 'filthiness' is weak combined with weak sexual curiosity and also if information concerning sexual abuse is readily available and the response from around is protective and the frequency is high, the subject will pursue positive changes to 'being taken aback' and 'filthiness', further aided by the fact that the subject is open-minded, the family is cohesive and the abuser is unfamiliar. 3) In case the level of violence is strong and 'being taken abuse' and 'filthiness' is strong because of weak sexual curiosity and also if information concerning sexual abuse is reading available and the response from around is readily available and the response from around is protective and the frequency is low, the subject will persue positive changes marked by 'informing' and 'escaping' despite the fact that the family cohesion is weak and the abuser is familiar. 4) In case the level of violence is strong and 'being taken aback' and 'filthiness' is strong because of weak sexual curiosity and also if information concerning sexual abuse is not readily available and the response from around is respelling and the frequency is low negative internalization like 'covering-up' and 'escaping' will take place, further aggravated by the fact that the subject's personality is closed, family cohesion is weak, and subject is familiar. On the basis of the above finding, it is recommended that nursing intervention should focus on promoting the milieu conductive to the victims pursuing positive changes along with the adequate aids from protection facilities as well as from the people around them.
A Study on the Effect of Network Centralities on Recommendation Performance (네트워크 중심성 척도가 추천 성능에 미치는 영향에 대한 연구)
- Lee, Dongwon
-
- Journal of Intelligence and Information Systems
- /
- v.27 no.1
- /
- pp.23-46
- /
- 2021
-
Collaborative filtering, which is often used in personalization recommendations, is recognized as a very useful technique to find similar customers and recommend products to them based on their purchase history. However, the traditional collaborative filtering technique has raised the question of having difficulty calculating the similarity for new customers or products due to the method of calculating similaritiesbased on direct connections and common features among customers. For this reason, a hybrid technique was designed to use content-based filtering techniques together. On the one hand, efforts have been made to solve these problems by applying the structural characteristics of social networks. This applies a method of indirectly calculating similarities through their similar customers placed between them. This means creating a customer's network based on purchasing data and calculating the similarity between the two based on the features of the network that indirectly connects the two customers within this network. Such similarity can be used as a measure to predict whether the target customer accepts recommendations. The centrality metrics of networks can be utilized for the calculation of these similarities. Different centrality metrics have important implications in that they may have different effects on recommended performance. In this study, furthermore, the effect of these centrality metrics on the performance of recommendation may vary depending on recommender algorithms. In addition, recommendation techniques using network analysis can be expected to contribute to increasing recommendation performance even if they apply not only to new customers or products but also to entire customers or products. By considering a customer's purchase of an item as a link generated between the customer and the item on the network, the prediction of user acceptance of recommendation is solved as a prediction of whether a new link will be created between them. As the classification models fit the purpose of solving the binary problem of whether the link is engaged or not, decision tree, k-nearest neighbors (KNN), logistic regression, artificial neural network, and support vector machine (SVM) are selected in the research. The data for performance evaluation used order data collected from an online shopping mall over four years and two months. Among them, the previous three years and eight months constitute social networks composed of and the experiment was conducted by organizing the data collected into the social network. The next four months' records were used to train and evaluate recommender models. Experiments with the centrality metrics applied to each model show that the recommendation acceptance rates of the centrality metrics are different for each algorithm at a meaningful level. In this work, we analyzed only four commonly used centrality metrics: degree centrality, betweenness centrality, closeness centrality, and eigenvector centrality. Eigenvector centrality records the lowest performance in all models except support vector machines. Closeness centrality and betweenness centrality show similar performance across all models. Degree centrality ranking moderate across overall models while betweenness centrality always ranking higher than degree centrality. Finally, closeness centrality is characterized by distinct differences in performance according to the model. It ranks first in logistic regression, artificial neural network, and decision tree withnumerically high performance. However, it only records very low rankings in support vector machine and K-neighborhood with low-performance levels. As the experiment results reveal, in a classification model, network centrality metrics over a subnetwork that connects the two nodes can effectively predict the connectivity between two nodes in a social network. Furthermore, each metric has a different performance depending on the classification model type. This result implies that choosing appropriate metrics for each algorithm can lead to achieving higher recommendation performance. In general, betweenness centrality can guarantee a high level of performance in any model. It would be possible to consider the introduction of proximity centrality to obtain higher performance for certain models.
A Study on Searching for Export Candidate Countries of the Korean Food and Beverage Industry Using Node2vec Graph Embedding and Light GBM Link Prediction (Node2vec 그래프 임베딩과 Light GBM 링크 예측을 활용한 식음료 산업의 수출 후보국가 탐색 연구)
- Lee, Jae-Seong;Jun, Seung-Pyo;Seo, Jinny
-
- Journal of Intelligence and Information Systems
- /
- v.27 no.4
- /
- pp.73-95
- /
- 2021
-
This study uses Node2vec graph embedding method and Light GBM link prediction to explore undeveloped export candidate countries in Korea's food and beverage industry. Node2vec is the method that improves the limit of the structural equivalence representation of the network, which is known to be relatively weak compared to the existing link prediction method based on the number of common neighbors of the network. Therefore, the method is known to show excellent performance in both community detection and structural equivalence of the network. The vector value obtained by embedding the network in this way operates under the condition of a constant length from an arbitrarily designated starting point node. Therefore, it has the advantage that it is easy to apply the sequence of nodes as an input value to the model for downstream tasks such as Logistic Regression, Support Vector Machine, and Random Forest. Based on these features of the Node2vec graph embedding method, this study applied the above method to the international trade information of the Korean food and beverage industry. Through this, we intend to contribute to creating the effect of extensive margin diversification in Korea in the global value chain relationship of the industry. The optimal predictive model derived from the results of this study recorded a precision of 0.95 and a recall of 0.79, and an F1 score of 0.86, showing excellent performance. This performance was shown to be superior to that of the binary classifier based on Logistic Regression set as the baseline model. In the baseline model, a precision of 0.95 and a recall of 0.73 were recorded, and an F1 score of 0.83 was recorded. In addition, the light GBM-based optimal prediction model derived from this study showed superior performance than the link prediction model of previous studies, which is set as a benchmarking model in this study. The predictive model of the previous study recorded only a recall rate of 0.75, but the proposed model of this study showed better performance which recall rate is 0.79. The difference in the performance of the prediction results between benchmarking model and this study model is due to the model learning strategy. In this study, groups were classified by the trade value scale, and prediction models were trained differently for these groups. Specific methods are (1) a method of randomly masking and learning a model for all trades without setting specific conditions for trade value, (2) arbitrarily masking a part of the trades with an average trade value or higher and using the model method, and (3) a method of arbitrarily masking some of the trades with the top 25% or higher trade value and learning the model. As a result of the experiment, it was confirmed that the performance of the model trained by randomly masking some of the trades with the above-average trade value in this method was the best and appeared stably. It was found that most of the results of potential export candidates for Korea derived through the above model appeared appropriate through additional investigation. Combining the above, this study could suggest the practical utility of the link prediction method applying Node2vec and Light GBM. In addition, useful implications could be derived for weight update strategies that can perform better link prediction while training the model. On the other hand, this study also has policy utility because it is applied to trade transactions that have not been performed much in the research related to link prediction based on graph embedding. The results of this study support a rapid response to changes in the global value chain such as the recent US-China trade conflict or Japan's export regulations, and I think that it has sufficient usefulness as a tool for policy decision-making.
이메일무단수집거부
- 본 웹사이트에 게시된 이메일 주소가 전자우편 수집 프로그램이나 그 밖의 기술적 장치를 이용하여 무단으로 수집되는 것을 거부하며, 이를 위반시 정보통신망법에 의해 형사 처벌됨을 유념하시기 바랍니다.
- [게시일 2004년 10월 1일]
이용약관
-
제 1 장 총칙
- 제 1 조 (목적) 이 이용약관은 KoreaScience 홈페이지(이하 “당 사이트”)에서 제공하는 인터넷 서비스(이하 '서비스')의 가입조건 및 이용에 관한 제반 사항과 기타 필요한 사항을 구체적으로 규정함을 목적으로 합니다.
- 제 2 조 (용어의 정의) ① "이용자"라 함은 당 사이트에 접속하여 이 약관에 따라 당 사이트가 제공하는 서비스를 받는 회원 및 비회원을 말합니다. ② "회원"이라 함은 서비스를 이용하기 위하여 당 사이트에 개인정보를 제공하여 아이디(ID)와 비밀번호를 부여 받은 자를 말합니다. ③ "회원 아이디(ID)"라 함은 회원의 식별 및 서비스 이용을 위하여 자신이 선정한 문자 및 숫자의 조합을 말합니다. ④ "비밀번호(패스워드)"라 함은 회원이 자신의 비밀보호를 위하여 선정한 문자 및 숫자의 조합을 말합니다.
- 제 3 조 (이용약관의 효력 및 변경) ① 이 약관은 당 사이트에 게시하거나 기타의 방법으로 회원에게 공지함으로써 효력이 발생합니다. ② 당 사이트는 이 약관을 개정할 경우에 적용일자 및 개정사유를 명시하여 현행 약관과 함께 당 사이트의 초기화면에 그 적용일자 7일 이전부터 적용일자 전일까지 공지합니다. 다만, 회원에게 불리하게 약관내용을 변경하는 경우에는 최소한 30일 이상의 사전 유예기간을 두고 공지합니다. 이 경우 당 사이트는 개정 전 내용과 개정 후 내용을 명확하게 비교하여 이용자가 알기 쉽도록 표시합니다.
- 제 4 조(약관 외 준칙) ① 이 약관은 당 사이트가 제공하는 서비스에 관한 이용안내와 함께 적용됩니다. ② 이 약관에 명시되지 아니한 사항은 관계법령의 규정이 적용됩니다.
-
제 2 장 이용계약의 체결
- 제 5 조 (이용계약의 성립 등) ① 이용계약은 이용고객이 당 사이트가 정한 약관에 「동의합니다」를 선택하고, 당 사이트가 정한 온라인신청양식을 작성하여 서비스 이용을 신청한 후, 당 사이트가 이를 승낙함으로써 성립합니다. ② 제1항의 승낙은 당 사이트가 제공하는 과학기술정보검색, 맞춤정보, 서지정보 등 다른 서비스의 이용승낙을 포함합니다.
- 제 6 조 (회원가입) 서비스를 이용하고자 하는 고객은 당 사이트에서 정한 회원가입양식에 개인정보를 기재하여 가입을 하여야 합니다.
- 제 7 조 (개인정보의 보호 및 사용) 당 사이트는 관계법령이 정하는 바에 따라 회원 등록정보를 포함한 회원의 개인정보를 보호하기 위해 노력합니다. 회원 개인정보의 보호 및 사용에 대해서는 관련법령 및 당 사이트의 개인정보 보호정책이 적용됩니다.
- 제 8 조 (이용 신청의 승낙과 제한) ① 당 사이트는 제6조의 규정에 의한 이용신청고객에 대하여 서비스 이용을 승낙합니다. ② 당 사이트는 아래사항에 해당하는 경우에 대해서 승낙하지 아니 합니다. - 이용계약 신청서의 내용을 허위로 기재한 경우 - 기타 규정한 제반사항을 위반하며 신청하는 경우
- 제 9 조 (회원 ID 부여 및 변경 등) ① 당 사이트는 이용고객에 대하여 약관에 정하는 바에 따라 자신이 선정한 회원 ID를 부여합니다. ② 회원 ID는 원칙적으로 변경이 불가하며 부득이한 사유로 인하여 변경 하고자 하는 경우에는 해당 ID를 해지하고 재가입해야 합니다. ③ 기타 회원 개인정보 관리 및 변경 등에 관한 사항은 서비스별 안내에 정하는 바에 의합니다.
-
제 3 장 계약 당사자의 의무
- 제 10 조 (KISTI의 의무) ① 당 사이트는 이용고객이 희망한 서비스 제공 개시일에 특별한 사정이 없는 한 서비스를 이용할 수 있도록 하여야 합니다. ② 당 사이트는 개인정보 보호를 위해 보안시스템을 구축하며 개인정보 보호정책을 공시하고 준수합니다. ③ 당 사이트는 회원으로부터 제기되는 의견이나 불만이 정당하다고 객관적으로 인정될 경우에는 적절한 절차를 거쳐 즉시 처리하여야 합니다. 다만, 즉시 처리가 곤란한 경우는 회원에게 그 사유와 처리일정을 통보하여야 합니다.
- 제 11 조 (회원의 의무) ① 이용자는 회원가입 신청 또는 회원정보 변경 시 실명으로 모든 사항을 사실에 근거하여 작성하여야 하며, 허위 또는 타인의 정보를 등록할 경우 일체의 권리를 주장할 수 없습니다. ② 당 사이트가 관계법령 및 개인정보 보호정책에 의거하여 그 책임을 지는 경우를 제외하고 회원에게 부여된 ID의 비밀번호 관리소홀, 부정사용에 의하여 발생하는 모든 결과에 대한 책임은 회원에게 있습니다. ③ 회원은 당 사이트 및 제 3자의 지적 재산권을 침해해서는 안 됩니다.
-
제 4 장 서비스의 이용
- 제 12 조 (서비스 이용 시간) ① 서비스 이용은 당 사이트의 업무상 또는 기술상 특별한 지장이 없는 한 연중무휴, 1일 24시간 운영을 원칙으로 합니다. 단, 당 사이트는 시스템 정기점검, 증설 및 교체를 위해 당 사이트가 정한 날이나 시간에 서비스를 일시 중단할 수 있으며, 예정되어 있는 작업으로 인한 서비스 일시중단은 당 사이트 홈페이지를 통해 사전에 공지합니다. ② 당 사이트는 서비스를 특정범위로 분할하여 각 범위별로 이용가능시간을 별도로 지정할 수 있습니다. 다만 이 경우 그 내용을 공지합니다.
- 제 13 조 (홈페이지 저작권) ① NDSL에서 제공하는 모든 저작물의 저작권은 원저작자에게 있으며, KISTI는 복제/배포/전송권을 확보하고 있습니다. ② NDSL에서 제공하는 콘텐츠를 상업적 및 기타 영리목적으로 복제/배포/전송할 경우 사전에 KISTI의 허락을 받아야 합니다. ③ NDSL에서 제공하는 콘텐츠를 보도, 비평, 교육, 연구 등을 위하여 정당한 범위 안에서 공정한 관행에 합치되게 인용할 수 있습니다. ④ NDSL에서 제공하는 콘텐츠를 무단 복제, 전송, 배포 기타 저작권법에 위반되는 방법으로 이용할 경우 저작권법 제136조에 따라 5년 이하의 징역 또는 5천만 원 이하의 벌금에 처해질 수 있습니다.
- 제 14 조 (유료서비스) ① 당 사이트 및 협력기관이 정한 유료서비스(원문복사 등)는 별도로 정해진 바에 따르며, 변경사항은 시행 전에 당 사이트 홈페이지를 통하여 회원에게 공지합니다. ② 유료서비스를 이용하려는 회원은 정해진 요금체계에 따라 요금을 납부해야 합니다.
-
제 5 장 계약 해지 및 이용 제한
- 제 15 조 (계약 해지) 회원이 이용계약을 해지하고자 하는 때에는 [가입해지] 메뉴를 이용해 직접 해지해야 합니다.
- 제 16 조 (서비스 이용제한) ① 당 사이트는 회원이 서비스 이용내용에 있어서 본 약관 제 11조 내용을 위반하거나, 다음 각 호에 해당하는 경우 서비스 이용을 제한할 수 있습니다. - 2년 이상 서비스를 이용한 적이 없는 경우 - 기타 정상적인 서비스 운영에 방해가 될 경우 ② 상기 이용제한 규정에 따라 서비스를 이용하는 회원에게 서비스 이용에 대하여 별도 공지 없이 서비스 이용의 일시정지, 이용계약 해지 할 수 있습니다.
- 제 17 조 (전자우편주소 수집 금지) 회원은 전자우편주소 추출기 등을 이용하여 전자우편주소를 수집 또는 제3자에게 제공할 수 없습니다.
-
제 6 장 손해배상 및 기타사항
- 제 18 조 (손해배상) 당 사이트는 무료로 제공되는 서비스와 관련하여 회원에게 어떠한 손해가 발생하더라도 당 사이트가 고의 또는 과실로 인한 손해발생을 제외하고는 이에 대하여 책임을 부담하지 아니합니다.
- 제 19 조 (관할 법원) 서비스 이용으로 발생한 분쟁에 대해 소송이 제기되는 경우 민사 소송법상의 관할 법원에 제기합니다.
- [부 칙] 1. (시행일) 이 약관은 2016년 9월 5일부터 적용되며, 종전 약관은 본 약관으로 대체되며, 개정된 약관의 적용일 이전 가입자도 개정된 약관의 적용을 받습니다.
Detail Search
Image Search (β)