• Title/Summary/Keyword: Network Mining

Search Result 1,041, Processing Time 0.031 seconds

Korea's Trade Rules Analysis using Topic Modeling : from 2000 to 2022 (토픽 모델링을 이용한 한국 무역규범 연구동향 분석 : 2000년~2022년)

  • Byeong-Ho Lim;Jeong-In Chang;Tae-Han Kim;Ha-Neul Han
    • Korea Trade Review
    • /
    • v.48 no.1
    • /
    • pp.55-81
    • /
    • 2023
  • The purpose of this study is to analyze the main issues and trends of Korean trade, and to draw implications for future research regarding trade rules. A total of 476 academic journal are analyzed using English keyword searched for 'Trade Rules' from 2000 to July 2022 in the Korean Journal Citation Index data base. The analysis methodology includes co-occurrence network and topic trend analysis which is a kind of text mining methods. The results shows that key words representing Korea's trade trend fall into four categories in which the number of research journals has rapidly increased, which are Topic 4 (Investment Treaty), Topic 7 (Trade Security), Topic 8 (China's Protectionism), and Topic 11 (Trade Settlement). The major background for these topics is the tension between the United States and China threatening the existing international trade system. A detailed study for China's protectionism, changes in trade security system, and new investment agreements, and changes in payment methods will be the challenges in near future.

Cost Performance Evaluation Framework through Analysis of Unstructured Construction Supervision Documents using Binomial Logistic Regression (비정형 공사감리문서 정보와 이항 로지스틱 회귀분석을 이용한 건축 현장 비용성과 평가 프레임워크 개발)

  • Kim, Chang-Won;Song, Taegeun;Lee, Kiseok;Yoo, Wi Sung
    • Journal of the Korea Institute of Building Construction
    • /
    • v.24 no.1
    • /
    • pp.121-131
    • /
    • 2024
  • This research explores the potential of leveraging unstructured data from construction supervision documents, which contain detailed inspection insights from independent third-party monitors of building construction processes. With the evolution of analytical methodologies, such unstructured data has been recognized as a valuable source of information, offering diverse insights. The study introduces a framework designed to assess cost performance by applying advanced analytical methods to the unstructured data found in final construction supervision reports. Specifically, key phrases were identified using text mining and social network analysis techniques, and these phrases were then analyzed through binomial logistic regression to assess cost performance. The study found that predictions of cost performance based on unstructured data from supervision documents achieved an accuracy rate of approximately 73%. The findings of this research are anticipated to serve as a foundational resource for analyzing various forms of unstructured data generated within the construction sector in future projects.

Exploring the phenomenon of veganphobia in vegan food and vegan fashion (비건 음식과 비건 패션에서 나타난 비건포비아 현상에 대한 탐구)

  • Yeong-Hyeon Choi;Sangyung Lee
    • The Research Journal of the Costume Culture
    • /
    • v.32 no.3
    • /
    • pp.381-397
    • /
    • 2024
  • This study investigates the negative perceptions (veganphobia) held by consumers toward vegan diets and fashion and aims to foster a genuine acceptance of ethical veganism in consumption. The textual data web-crawled Korean online posts, including news articles, blogs, forums, and tweets, containing keywords such as "contradiction," "dilemma," "conflict," "issues," "vegan food" and "vegan fashion" from 2013 to 2021. Data analysis was conducted through text mining, network analysis, and clustering analysis using Python and NodeXL programs. The analysis revealed distinct negative perceptions regarding vegan food. Key issues included the perception of hypocrisy among vegetarians, associations with specific political leanings, conflicts between environmental and animal rights, and contradictions between views on companion animals and livestock. Regarding the vegan fashion industry, the eco-friendliness of material selection and design processes were seen as the pivotal factors shaping negative attitudes. Furthermore, the study identified a shared negative perception regarding vegan food and vegan fashion. This negativity was characterized by confusion and conflicts between animal and environmental rights, biased perceptions linked to specific political affiliations, perceived self-righteousness among vegetarians, and general discomfort toward them. These factors collectively contributed to a broader negative perception of vegan consumption. In conclusion, this study is significant in understanding the complex perceptions and attitudes that con- sumers hold toward vegan food and fashion. The insights gained from this research can aid in the design of more effective campaign strategies aimed at promoting vegan consumerism, ultimately contributing to a more widespread acceptance of ethical veganism in society.

A Study on the Development Trend of Artificial Intelligence Using Text Mining Technique: Focused on Open Source Software Projects on Github (텍스트 마이닝 기법을 활용한 인공지능 기술개발 동향 분석 연구: 깃허브 상의 오픈 소스 소프트웨어 프로젝트를 대상으로)

  • Chong, JiSeon;Kim, Dongsung;Lee, Hong Joo;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.1-19
    • /
    • 2019
  • Artificial intelligence (AI) is one of the main driving forces leading the Fourth Industrial Revolution. The technologies associated with AI have already shown superior abilities that are equal to or better than people in many fields including image and speech recognition. Particularly, many efforts have been actively given to identify the current technology trends and analyze development directions of it, because AI technologies can be utilized in a wide range of fields including medical, financial, manufacturing, service, and education fields. Major platforms that can develop complex AI algorithms for learning, reasoning, and recognition have been open to the public as open source projects. As a result, technologies and services that utilize them have increased rapidly. It has been confirmed as one of the major reasons for the fast development of AI technologies. Additionally, the spread of the technology is greatly in debt to open source software, developed by major global companies, supporting natural language recognition, speech recognition, and image recognition. Therefore, this study aimed to identify the practical trend of AI technology development by analyzing OSS projects associated with AI, which have been developed by the online collaboration of many parties. This study searched and collected a list of major projects related to AI, which were generated from 2000 to July 2018 on Github. This study confirmed the development trends of major technologies in detail by applying text mining technique targeting topic information, which indicates the characteristics of the collected projects and technical fields. The results of the analysis showed that the number of software development projects by year was less than 100 projects per year until 2013. However, it increased to 229 projects in 2014 and 597 projects in 2015. Particularly, the number of open source projects related to AI increased rapidly in 2016 (2,559 OSS projects). It was confirmed that the number of projects initiated in 2017 was 14,213, which is almost four-folds of the number of total projects generated from 2009 to 2016 (3,555 projects). The number of projects initiated from Jan to Jul 2018 was 8,737. The development trend of AI-related technologies was evaluated by dividing the study period into three phases. The appearance frequency of topics indicate the technology trends of AI-related OSS projects. The results showed that the natural language processing technology has continued to be at the top in all years. It implied that OSS had been developed continuously. Until 2015, Python, C ++, and Java, programming languages, were listed as the top ten frequently appeared topics. However, after 2016, programming languages other than Python disappeared from the top ten topics. Instead of them, platforms supporting the development of AI algorithms, such as TensorFlow and Keras, are showing high appearance frequency. Additionally, reinforcement learning algorithms and convolutional neural networks, which have been used in various fields, were frequently appeared topics. The results of topic network analysis showed that the most important topics of degree centrality were similar to those of appearance frequency. The main difference was that visualization and medical imaging topics were found at the top of the list, although they were not in the top of the list from 2009 to 2012. The results indicated that OSS was developed in the medical field in order to utilize the AI technology. Moreover, although the computer vision was in the top 10 of the appearance frequency list from 2013 to 2015, they were not in the top 10 of the degree centrality. The topics at the top of the degree centrality list were similar to those at the top of the appearance frequency list. It was found that the ranks of the composite neural network and reinforcement learning were changed slightly. The trend of technology development was examined using the appearance frequency of topics and degree centrality. The results showed that machine learning revealed the highest frequency and the highest degree centrality in all years. Moreover, it is noteworthy that, although the deep learning topic showed a low frequency and a low degree centrality between 2009 and 2012, their ranks abruptly increased between 2013 and 2015. It was confirmed that in recent years both technologies had high appearance frequency and degree centrality. TensorFlow first appeared during the phase of 2013-2015, and the appearance frequency and degree centrality of it soared between 2016 and 2018 to be at the top of the lists after deep learning, python. Computer vision and reinforcement learning did not show an abrupt increase or decrease, and they had relatively low appearance frequency and degree centrality compared with the above-mentioned topics. Based on these analysis results, it is possible to identify the fields in which AI technologies are actively developed. The results of this study can be used as a baseline dataset for more empirical analysis on future technology trends that can be converged.

Product Recommender Systems using Multi-Model Ensemble Techniques (다중모형조합기법을 이용한 상품추천시스템)

  • Lee, Yeonjeong;Kim, Kyoung-Jae
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.2
    • /
    • pp.39-54
    • /
    • 2013
  • Recent explosive increase of electronic commerce provides many advantageous purchase opportunities to customers. In this situation, customers who do not have enough knowledge about their purchases, may accept product recommendations. Product recommender systems automatically reflect user's preference and provide recommendation list to the users. Thus, product recommender system in online shopping store has been known as one of the most popular tools for one-to-one marketing. However, recommender systems which do not properly reflect user's preference cause user's disappointment and waste of time. In this study, we propose a novel recommender system which uses data mining and multi-model ensemble techniques to enhance the recommendation performance through reflecting the precise user's preference. The research data is collected from the real-world online shopping store, which deals products from famous art galleries and museums in Korea. The data initially contain 5759 transaction data, but finally remain 3167 transaction data after deletion of null data. In this study, we transform the categorical variables into dummy variables and exclude outlier data. The proposed model consists of two steps. The first step predicts customers who have high likelihood to purchase products in the online shopping store. In this step, we first use logistic regression, decision trees, and artificial neural networks to predict customers who have high likelihood to purchase products in each product group. We perform above data mining techniques using SAS E-Miner software. In this study, we partition datasets into two sets as modeling and validation sets for the logistic regression and decision trees. We also partition datasets into three sets as training, test, and validation sets for the artificial neural network model. The validation dataset is equal for the all experiments. Then we composite the results of each predictor using the multi-model ensemble techniques such as bagging and bumping. Bagging is the abbreviation of "Bootstrap Aggregation" and it composite outputs from several machine learning techniques for raising the performance and stability of prediction or classification. This technique is special form of the averaging method. Bumping is the abbreviation of "Bootstrap Umbrella of Model Parameter," and it only considers the model which has the lowest error value. The results show that bumping outperforms bagging and the other predictors except for "Poster" product group. For the "Poster" product group, artificial neural network model performs better than the other models. In the second step, we use the market basket analysis to extract association rules for co-purchased products. We can extract thirty one association rules according to values of Lift, Support, and Confidence measure. We set the minimum transaction frequency to support associations as 5%, maximum number of items in an association as 4, and minimum confidence for rule generation as 10%. This study also excludes the extracted association rules below 1 of lift value. We finally get fifteen association rules by excluding duplicate rules. Among the fifteen association rules, eleven rules contain association between products in "Office Supplies" product group, one rules include the association between "Office Supplies" and "Fashion" product groups, and other three rules contain association between "Office Supplies" and "Home Decoration" product groups. Finally, the proposed product recommender systems provides list of recommendations to the proper customers. We test the usability of the proposed system by using prototype and real-world transaction and profile data. For this end, we construct the prototype system by using the ASP, Java Script and Microsoft Access. In addition, we survey about user satisfaction for the recommended product list from the proposed system and the randomly selected product lists. The participants for the survey are 173 persons who use MSN Messenger, Daum Caf$\acute{e}$, and P2P services. We evaluate the user satisfaction using five-scale Likert measure. This study also performs "Paired Sample T-test" for the results of the survey. The results show that the proposed model outperforms the random selection model with 1% statistical significance level. It means that the users satisfied the recommended product list significantly. The results also show that the proposed system may be useful in real-world online shopping store.

Analysis of Trends in Education Policy of STEAM Using Text Mining: Comparative Analysis of Ministry of Education's Documents, Articles, and Abstract of Researches from 2009 to 2020 (텍스트 마이닝을 활용한 융합인재교육정책 동향 분석 -2009년~2020년 교육부보도, 언론보도, 학술지 초록 비교분석-)

  • You, Jungmin;Kim, Sung-Won
    • Journal of The Korean Association For Science Education
    • /
    • v.41 no.6
    • /
    • pp.455-470
    • /
    • 2021
  • This study examines the trend changes in keywords and topics of STEAM education from 2009 to 2020 to derive future development direction and education implications. Among the collected data, 42 cases of Ministry of Education's documents, 1,534 cases of articles, and 880 cases of abstract of researches were selected as research subjects. Keyword analysis, keyword network and topic modeling were performed for each stage of STEAM education policy through the Python program. As a result of the analysis, according to the STEAM education policy stage, there were differences in the frequency and network of keywords related to STEAM education by media. It was confirmed that there was a difference in interest in STEAM education policy as there were differences in keywords and topics that were mainly used importantly by media. Most of the topics of the Ministry of Education's documents were found to correspond to topics derived from articles. The implications for the development direction of STEAM education derived from the results of this study are as follows: first, STEAM education needs to consider ways to connect multiple topics, including the humanities. Second, since the media has a difference in interest in STEAM education policy, it is necessary to seek a cooperative development direction through understanding this. Third, the Ministry of Education's support for core competency reinforcement and convergence literacy for nurturing future talents, the goal of STEAM education, and the media's efforts to increase the public's understanding of STEAM education are required. Lastly, it is necessary to continuously analyze the themes that will appear in the evaluation process and change STEAM education policy.

Analysis of the Landscape Characteristics of Island Tourist Site Using Big Data - Based on Bakji and Banwol-do, Shinan-gun - (빅데이터를 활용한 섬 관광지의 경관 특성 분석 - 신안군 박지·반월도를 대상으로 -)

  • Do, Jee-Yoon;Suh, Joo-Hwan
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.49 no.2
    • /
    • pp.61-73
    • /
    • 2021
  • This study aimed to identify the landscape perception and landscape characteristics of users by utilizing SNS data generated by their experiences. Therefore, how to recognize the main places and scenery appearing on the island, and what are the characteristics of the main scenery were analyzed using online text data and photo data. Text data are text mining and network structural analysis, while photographic data are landscape identification models and color analysis. As a result of the study, First, as a result of frequency analysis of Bakji·Banwol-do topics, we were able to derive keywords for local landscapes such as 'Purple Bridge', 'Doori Village', and location, behavior, and landscape images by analyzing them simultaneously. Second, the network structure analysis showed that the connection between key and undrawn keywords could be more specifically analyzed, indicating that creating landscapes using colors is affecting regional activation. Third, after analyzing the landscape identification model, it was found that artificial elements would be excluded to create preferred landscapes using the main targets of "Purple Bridge" and "Doori Village", and that it would be effective to set a view point of the sea and sky. Fourth, Bakji·Banwol-do were the first islands to be created under the theme of color, and the colors used in artificial facilities were similar to the surrounding environment, and were harmonized with contrasting lighting and saturation values. This study used online data uploaded directly by visitors in the landscape field to identify users' perceptions and objects of the landscape. Furthermore, the use of both text and photographic data to identify landscape recognition and characteristics is significant in that they can specifically identify which landscape and resources they prefer and perceive. In addition, the use of quantitative big data analysis and qualitative landscape identification models in identifying visitors' perceptions of local landscapes will help them understand the landscape more specifically through discussions based on results.

A Study of 'Emotion Trigger' by Text Mining Techniques (텍스트 마이닝을 이용한 감정 유발 요인 'Emotion Trigger'에 관한 연구)

  • An, Juyoung;Bae, Junghwan;Han, Namgi;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.69-92
    • /
    • 2015
  • The explosion of social media data has led to apply text-mining techniques to analyze big social media data in a more rigorous manner. Even if social media text analysis algorithms were improved, previous approaches to social media text analysis have some limitations. In the field of sentiment analysis of social media written in Korean, there are two typical approaches. One is the linguistic approach using machine learning, which is the most common approach. Some studies have been conducted by adding grammatical factors to feature sets for training classification model. The other approach adopts the semantic analysis method to sentiment analysis, but this approach is mainly applied to English texts. To overcome these limitations, this study applies the Word2Vec algorithm which is an extension of the neural network algorithms to deal with more extensive semantic features that were underestimated in existing sentiment analysis. The result from adopting the Word2Vec algorithm is compared to the result from co-occurrence analysis to identify the difference between two approaches. The results show that the distribution related word extracted by Word2Vec algorithm in that the words represent some emotion about the keyword used are three times more than extracted by co-occurrence analysis. The reason of the difference between two results comes from Word2Vec's semantic features vectorization. Therefore, it is possible to say that Word2Vec algorithm is able to catch the hidden related words which have not been found in traditional analysis. In addition, Part Of Speech (POS) tagging for Korean is used to detect adjective as "emotional word" in Korean. In addition, the emotion words extracted from the text are converted into word vector by the Word2Vec algorithm to find related words. Among these related words, noun words are selected because each word of them would have causal relationship with "emotional word" in the sentence. The process of extracting these trigger factor of emotional word is named "Emotion Trigger" in this study. As a case study, the datasets used in the study are collected by searching using three keywords: professor, prosecutor, and doctor in that these keywords contain rich public emotion and opinion. Advanced data collecting was conducted to select secondary keywords for data gathering. The secondary keywords for each keyword used to gather the data to be used in actual analysis are followed: Professor (sexual assault, misappropriation of research money, recruitment irregularities, polifessor), Doctor (Shin hae-chul sky hospital, drinking and plastic surgery, rebate) Prosecutor (lewd behavior, sponsor). The size of the text data is about to 100,000(Professor: 25720, Doctor: 35110, Prosecutor: 43225) and the data are gathered from news, blog, and twitter to reflect various level of public emotion into text data analysis. As a visualization method, Gephi (http://gephi.github.io) was used and every program used in text processing and analysis are java coding. The contributions of this study are as follows: First, different approaches for sentiment analysis are integrated to overcome the limitations of existing approaches. Secondly, finding Emotion Trigger can detect the hidden connections to public emotion which existing method cannot detect. Finally, the approach used in this study could be generalized regardless of types of text data. The limitation of this study is that it is hard to say the word extracted by Emotion Trigger processing has significantly causal relationship with emotional word in a sentence. The future study will be conducted to clarify the causal relationship between emotional words and the words extracted by Emotion Trigger by comparing with the relationships manually tagged. Furthermore, the text data used in Emotion Trigger are twitter, so the data have a number of distinct features which we did not deal with in this study. These features will be considered in further study.

Science and Technology Policy Studies, Society, and the State : An Analysis of a Co-evolution Among Social Issue, Governmental Policy, and Academic Research in Science and Technology (과학기술정책 연구와 사회, 정부 : 과학기술의 사회이슈, 정부정책, 학술연구의 공진화 분석)

  • Kwon, Ki-Seok;Jeong, Seohwa;Yi, Chan-Goo
    • Journal of Korea Technology Innovation Society
    • /
    • v.21 no.1
    • /
    • pp.64-91
    • /
    • 2018
  • This study explores the interactive pattern among social issue, academic research, and governmental policy on science and technology during the last 20 years. In particular, we try understand wether the science and technology policy research and governmental policy meets social needs appropriately. In order to do this, we have collected text data from news articles, papers, and governmental documents. Based on these data, social network analysis and cluster analysis has been carried out. According to the results, we have found that science and technology policy researches tend to focus on fragmented technological innovation meeting urgent practical needs at the initial stage. However, recently, the main characteristics of science and technology policy research shows co-evolutionary patterns responding to society. Furthermore, time lag also has been observed in the process of interaction among the three bodies. Based on these results, we put forward some suggestions for upcoming researches in science and technology policy. Firstly, analysis levels are needed to be shifted from micro level to mezo or macro level. Secondly, more research efforts are required to be focused on policy process in science technology and its public management. Finally, we have to enhance the sensitiveness to social issues through studies on agenda setting in science and technology policy.

A Novel Methodology for Extracting Core Technology and Patents by IP Mining (핵심 기술 및 특허 추출을 위한 IP 마이닝에 관한 연구)

  • Kim, Hyun Woo;Kim, Jongchan;Lee, Joonhyuck;Park, Sangsung;Jang, Dongsik
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.25 no.4
    • /
    • pp.392-397
    • /
    • 2015
  • Society has been developed through analogue, digital, and smart era. Every technology is going through consistent changes and rapid developments. In this competitive society, R&D strategy establishment is significantly useful and helpful for improving technology competitiveness. A patent document includes technical and legal rights information such as title, abstract, description, claim, and patent classification code. From the patent document, a lot of people can understand and collect legal and technical information. This unique feature of patent can be quantitatively applied for technology analysis. This research paper proposes a methodology for extracting core technology and patents based on quantitative methods. Statistical analysis and social network analysis are applied to IPC codes in order to extract core technologies with active R&D and high centralities. Then, core patents are also extracted by analyzing citation and family information.