• Title/Summary/Keyword: 그래프 마이닝

Search Result 70, Processing Time 0.023 seconds

Analysis on Status and Trends of SIAM Journal Papers using Text Mining (텍스트마이닝 기법을 활용한 미국산업응용수학 학회지의 연구 현황 및 동향 분석)

  • Kim, Sung-Yeun
    • The Journal of the Korea Contents Association
    • /
    • v.20 no.7
    • /
    • pp.212-222
    • /
    • 2020
  • The purpose of this study is to understand the current status and trends of the research studies published by the Society for Industrial and Applied Mathematics which is a leader in the field of industrial mathematics around the world. To perform this purpose, titles and abstracts were collected from 6,255 research articles between 2016 and 2019, and the R program was used to analyze the topic modeling model with LDA techniques and a regression model. As the results of analyses, first, a variety of studies have been studied in the fields of industrial mathematics, such as algebra, discrete mathematics, geometry, topological mathematics, probability and statistics. Second, it was found that the ascending research subjects were fluid mechanics, graph theory, and stochastic differential equations, and the descending research subjects were computational theory and classical geometry. The results of the study, based on the understanding of the overall flows and changes of the intellectual structure in the fields of industrial mathematics, are expected to provide researchers in the field with implications of the future direction of research and how to build an industrial mathematics curriculum that reflects the zeitgeist in the field of education.

Automatic Construction of Reduced Dimensional Cluster-based Keyword Association Networks using LSI (LSI를 이용한 차원 축소 클러스터 기반 키워드 연관망 자동 구축 기법)

  • Yoo, Han-mook;Kim, Han-joon;Chang, Jae-young
    • Journal of KIISE
    • /
    • v.44 no.11
    • /
    • pp.1236-1243
    • /
    • 2017
  • In this paper, we propose a novel way of producing keyword networks, named LSI-based ClusterTextRank, which extracts significant key words from a set of clusters with a mutual information metric, and constructs an association network using latent semantic indexing (LSI). The proposed method reduces the dimension of documents through LSI, decomposes documents into multiple clusters through k-means clustering, and expresses the words within each cluster as a maximal spanning tree graph. The significant key words are identified by evaluating their mutual information within clusters. Then, the method calculates the similarities between the extracted key words using the term-concept matrix, and the results are represented as a keyword association network. To evaluate the performance of the proposed method, we used travel-related blog data and showed that the proposed method outperforms the existing TextRank algorithm by about 14% in terms of accuracy.

Analysis of Social Network According to The Distance of Characters Statements (소설 등장인물의 텍스트 거리를 이용한 사회 구성망 분석)

  • Park, Gyeong-Mi;Kim, Sung-Hwan;Cho, Hwan-Gue
    • The Journal of the Korea Contents Association
    • /
    • v.13 no.4
    • /
    • pp.427-439
    • /
    • 2013
  • With the fast development of complex science, lots of social networks are studied. We know that the social network is widely applied in analyzing issues in human culture, economics and web sciences. Recently we witness that some researchers began to compare the social network constructed from fiction literatures(literature social network) and the real social network obtained from practice. But we point that previous approaches for literature social network have some drawbacks since they completely depend on the biographical dictionary constructed for a designated literature. So since the previous approach focus on the few important characters and peoples around them, we can not understand the global structure of all characters appeared in the literature at least once. We propose one method to extract all characters appeared in the literature and how to make the social network from that information. Also we newly propose K-critical network by applying frequency of the named characters and the strength of relationship among all textual characters. Our experiment shows that the K-critical measure could be one crucial quantitative measure to compute the relationship strength among characters appeared in the object literature.

Analysis of News Agenda Using Text mining and Semantic Network Analysis: Focused on COVID-19 Emotions (텍스트 마이닝과 의미 네트워크 분석을 활용한 뉴스 의제 분석: 코로나 19 관련 감정을 중심으로)

  • Yoo, So-yeon;Lim, Gyoo-gun
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.1
    • /
    • pp.47-64
    • /
    • 2021
  • The global spread of COVID-19 around the world has not only affected many parts of our daily life but also has a huge impact on many areas, including the economy and society. As the number of confirmed cases and deaths increases, medical staff and the public are said to be experiencing psychological problems such as anxiety, depression, and stress. The collective tragedy that accompanies the epidemic raises fear and anxiety, which is known to cause enormous disruptions to the behavior and psychological well-being of many. Long-term negative emotions can reduce people's immunity and destroy their physical balance, so it is essential to understand the psychological state of COVID-19. This study suggests a method of monitoring medial news reflecting current days which requires striving not only for physical but also for psychological quarantine in the prolonged COVID-19 situation. Moreover, it is presented how an easier method of analyzing social media networks applies to those cases. The aim of this study is to assist health policymakers in fast and complex decision-making processes. News plays a major role in setting the policy agenda. Among various major media, news headlines are considered important in the field of communication science as a summary of the core content that the media wants to convey to the audiences who read it. News data used in this study was easily collected using "Bigkinds" that is created by integrating big data technology. With the collected news data, keywords were classified through text mining, and the relationship between words was visualized through semantic network analysis between keywords. Using the KrKwic program, a Korean semantic network analysis tool, text mining was performed and the frequency of words was calculated to easily identify keywords. The frequency of words appearing in keywords of articles related to COVID-19 emotions was checked and visualized in word cloud 'China', 'anxiety', 'situation', 'mind', 'social', and 'health' appeared high in relation to the emotions of COVID-19. In addition, UCINET, a specialized social network analysis program, was used to analyze connection centrality and cluster analysis, and a method of visualizing a graph using Net Draw was performed. As a result of analyzing the connection centrality between each data, it was found that the most central keywords in the keyword-centric network were 'psychology', 'COVID-19', 'blue', and 'anxiety'. The network of frequency of co-occurrence among the keywords appearing in the headlines of the news was visualized as a graph. The thickness of the line on the graph is proportional to the frequency of co-occurrence, and if the frequency of two words appearing at the same time is high, it is indicated by a thick line. It can be seen that the 'COVID-blue' pair is displayed in the boldest, and the 'COVID-emotion' and 'COVID-anxiety' pairs are displayed with a relatively thick line. 'Blue' related to COVID-19 is a word that means depression, and it was confirmed that COVID-19 and depression are keywords that should be of interest now. The research methodology used in this study has the convenience of being able to quickly measure social phenomena and changes while reducing costs. In this study, by analyzing news headlines, we were able to identify people's feelings and perceptions on issues related to COVID-19 depression, and identify the main agendas to be analyzed by deriving important keywords. By presenting and visualizing the subject and important keywords related to the COVID-19 emotion at a time, medical policy managers will be able to be provided a variety of perspectives when identifying and researching the regarding phenomenon. It is expected that it can help to use it as basic data for support, treatment and service development for psychological quarantine issues related to COVID-19.

Twitter Issue Tracking System by Topic Modeling Techniques (토픽 모델링을 이용한 트위터 이슈 트래킹 시스템)

  • Bae, Jung-Hwan;Han, Nam-Gi;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.109-122
    • /
    • 2014
  • People are nowadays creating a tremendous amount of data on Social Network Service (SNS). In particular, the incorporation of SNS into mobile devices has resulted in massive amounts of data generation, thereby greatly influencing society. This is an unmatched phenomenon in history, and now we live in the Age of Big Data. SNS Data is defined as a condition of Big Data where the amount of data (volume), data input and output speeds (velocity), and the variety of data types (variety) are satisfied. If someone intends to discover the trend of an issue in SNS Big Data, this information can be used as a new important source for the creation of new values because this information covers the whole of society. In this study, a Twitter Issue Tracking System (TITS) is designed and established to meet the needs of analyzing SNS Big Data. TITS extracts issues from Twitter texts and visualizes them on the web. The proposed system provides the following four functions: (1) Provide the topic keyword set that corresponds to daily ranking; (2) Visualize the daily time series graph of a topic for the duration of a month; (3) Provide the importance of a topic through a treemap based on the score system and frequency; (4) Visualize the daily time-series graph of keywords by searching the keyword; The present study analyzes the Big Data generated by SNS in real time. SNS Big Data analysis requires various natural language processing techniques, including the removal of stop words, and noun extraction for processing various unrefined forms of unstructured data. In addition, such analysis requires the latest big data technology to process rapidly a large amount of real-time data, such as the Hadoop distributed system or NoSQL, which is an alternative to relational database. We built TITS based on Hadoop to optimize the processing of big data because Hadoop is designed to scale up from single node computing to thousands of machines. Furthermore, we use MongoDB, which is classified as a NoSQL database. In addition, MongoDB is an open source platform, document-oriented database that provides high performance, high availability, and automatic scaling. Unlike existing relational database, there are no schema or tables with MongoDB, and its most important goal is that of data accessibility and data processing performance. In the Age of Big Data, the visualization of Big Data is more attractive to the Big Data community because it helps analysts to examine such data easily and clearly. Therefore, TITS uses the d3.js library as a visualization tool. This library is designed for the purpose of creating Data Driven Documents that bind document object model (DOM) and any data; the interaction between data is easy and useful for managing real-time data stream with smooth animation. In addition, TITS uses a bootstrap made of pre-configured plug-in style sheets and JavaScript libraries to build a web system. The TITS Graphical User Interface (GUI) is designed using these libraries, and it is capable of detecting issues on Twitter in an easy and intuitive manner. The proposed work demonstrates the superiority of our issue detection techniques by matching detected issues with corresponding online news articles. The contributions of the present study are threefold. First, we suggest an alternative approach to real-time big data analysis, which has become an extremely important issue. Second, we apply a topic modeling technique that is used in various research areas, including Library and Information Science (LIS). Based on this, we can confirm the utility of storytelling and time series analysis. Third, we develop a web-based system, and make the system available for the real-time discovery of topics. The present study conducted experiments with nearly 150 million tweets in Korea during March 2013.

Mining Frequent Trajectory Patterns in RFID Data Streams (RFID 데이터 스트림에서 이동궤적 패턴의 탐사)

  • Seo, Sung-Bo;Lee, Yong-Mi;Lee, Jun-Wook;Nam, Kwang-Woo;Ryu, Keun-Ho;Park, Jin-Soo
    • Journal of Korea Spatial Information System Society
    • /
    • v.11 no.1
    • /
    • pp.127-136
    • /
    • 2009
  • This paper proposes an on-line mining algorithm of moving trajectory patterns in RFID data streams considering changing characteristics over time and constraints of single-pass data scan. Since RFID, sensor, and mobile network technology have been rapidly developed, many researchers have been recently focused on the study of real-time data gathering from real-world and mining the useful patterns from them. Previous researches for sequential patterns or moving trajectory patterns based on stream data have an extremely time-consum ing problem because of multi-pass database scan and tree traversal, and they also did not consider the time-changing characteristics of stream data. The proposed method preserves the sequential strength of 2-lengths frequent patterns in binary relationship table using the time-evolving graph to exactly reflect changes of RFID data stream from time to time. In addition, in order to solve the problem of the repetitive data scans, the proposed algorithm infers candidate k-lengths moving trajectory patterns beforehand at a time point t, and then extracts the patterns after screening the candidate patterns by only one-pass at a time point t+1. Through the experiment, the proposed method shows the superior performance in respect of time and space complexity than the Apriori-like method according as the reduction ratio of candidate sets is about 7 percent.

  • PDF

Analysis of Waterpark Status and Recognition Using Big Data Analysis (빅데이터 분석을 활용한 워터파크 현황 및 인식 분석)

  • Kim, Jae-Hwan;Lee, Jae-Moon
    • Journal of Digital Convergence
    • /
    • v.15 no.10
    • /
    • pp.525-535
    • /
    • 2017
  • The purpose of this study aims to examine consumer perception and current status of water park. The Naver and Daum were used for data collection channels and the keyword 'water park' was used for data retrieval. The data analysis period was limited to the study period from January 1, 2015 to December 31, 2016 for a total of two years. First, as a result of the frequency analysis, hidden cameras, Lotte water park, arrests, suspects, gimhae were in top 5 in 2015, Lotte water park, swimming, summer, opening, admission ticket were in top 5 in 2016. Second, as a result of the connection degree central analysis, hidden camera, arrest, suspect, female, shower room were in top 5 in 2015, swimming, Lotte water park, summer and One Mount, admission ticket were in top 5 in 2016. Third, as a result of the N-GRAM network graph, the water park/hidden camera, the hidden camera/hidden camera, the suspect/arrest, the Gimhae/Lotte water park, water park/suspect were in top 5 in 2015, and One Mount/water park, Gimhae/Lotte water park, water park/admission ticket, water park/water park, water park/opening were in top 5 in 2016. Fourth, as a result of the CONCOR analysis, three groups in 2015 and two groups in 2016 were formed.

Web Site Keyword Selection Method by Considering Semantic Similarity Based on Word2Vec (Word2Vec 기반의 의미적 유사도를 고려한 웹사이트 키워드 선택 기법)

  • Lee, Donghun;Kim, Kwanho
    • The Journal of Society for e-Business Studies
    • /
    • v.23 no.2
    • /
    • pp.83-96
    • /
    • 2018
  • Extracting keywords representing documents is very important because it can be used for automated services such as document search, classification, recommendation system as well as quickly transmitting document information. However, when extracting keywords based on the frequency of words appearing in a web site documents and graph algorithms based on the co-occurrence of words, the problem of containing various words that are not related to the topic potentially in the web page structure, There is a difficulty in extracting the semantic keyword due to the limit of the performance of the Korean tokenizer. In this paper, we propose a method to select candidate keywords based on semantic similarity, and solve the problem that semantic keyword can not be extracted and the accuracy of Korean tokenizer analysis is poor. Finally, we use the technique of extracting final semantic keywords through filtering process to remove inconsistent keywords. Experimental results through real web pages of small business show that the performance of the proposed method is improved by 34.52% over the statistical similarity based keyword selection technique. Therefore, it is confirmed that the performance of extracting keywords from documents is improved by considering semantic similarity between words and removing inconsistent keywords.

Content Analysis of Food and Nutrition Unit in High School Textbooks of Home Economics: Focus on the National Curriculums from 7th to 2015 Revised (고등학교 '기술·가정' 교과 식생활 영역의 교육내용 분석: 제7차 교육과정부터 2015 개정 교육과정까지의 교과서 내용을 중심으로)

  • Park, Chae Eun;Kim, Yoo Kyeong
    • Journal of Korean Home Economics Education Association
    • /
    • v.31 no.4
    • /
    • pp.97-113
    • /
    • 2019
  • This study is focused on the examination of changes in textbooks of Home Economics in High school from 7st to 2015 curriculum, especially the 'Food and Nutrition section. We investigated the content elements of the National Curriculum Guide, the changes in learning contents, and the number of pages of Food and Nutrition section. The key words were extracted and the connective relationships between words were visualized using a method of language network analysis through word cloud and Semantic Network Analysis. According to the results of the research, the portion of the Food and Nutrition section has been gradually decreased on the Technology·Home Economics, following the development of the curriculum. Through the whole curriculum, 'invitation', 'Korean food', 'baby·nutrition' are appeared as key words. The education contents of Food and Nutrition section from the 7th to 2015 revised have been developed and advanced with the changes of social needs. However, the reduction of portion and insufficiency of content elements of Food and Nutrition section bring concerns toward the decline of the quality of education on dietary life.

The Empirical Study on the Effect of Technology Exchanges in the Fourth Industrial Revolution between Korea and China: Focused on the Firm Social Network Analysis (한중 4차산업혁명 기술교류 및 효과에 대한 실증연구: 기업 소셜 네트워크 분석 중심으로)

  • Zhou, Zhenxin;Sohn, Kwonsang;Hwang, Yoon Min;Kwon, Ohbyung
    • The Journal of Society for e-Business Studies
    • /
    • v.25 no.3
    • /
    • pp.41-61
    • /
    • 2020
  • China's rapid development and commercialization of high-tech technologies in the fourth industrial revolution has led to effective technology exchanges between Korean and Chinese firms becoming more important to Korea's mid-term and long-term industrial development. However, there is still a lack of empirical research on how technology exchanges between Korean and Chinese firms proceed and their effectiveness. In response, this study conducted a social network analysis based on text mining data of Korea-China business technology exchange and cooperation articles introduced in the news from 2018 to March 2020 on the current status and effects of Korea-China technology exchanges related to the fourth industrial revolution, and conducted a regression analysis how network centrality effect on the firm performance. According to the results, most of the Korean major electronic firms are actively networking with Chinese firms and institutions, showing high centrality in the centrality index. Korean telecommunication firms showed high betweenness centrality and subgraph centrality, and Korean Internet service providers and broadcasting contents firms showed high eigenvector centrality. In addition, Chinese firms showed higher betweenness centrality than Korean firms, and Chinese service firms showed higher closeness centrality than manufacturing firms. As a result of regression analysis, this network centrality had a positive effect on firm performance. To the best of our knowledge, this is the first to analyze the impact of the technical cooperation between Korean and Chinese firms under the fourth industrial revolution context. This study has theoretical implications that suggested the direction of social network analysis-based empirical research in global firm cooperation. Also, this study has practical implications that the guidelines for network analysis in setting the direction of technical cooperation between Korea and China by firms or governments.