• Title/Summary/Keyword: co-word network

Search Result 103, Processing Time 0.024 seconds

Construction of Event Networks from Large News Data Using Text Mining Techniques (텍스트 마이닝 기법을 적용한 뉴스 데이터에서의 사건 네트워크 구축)

  • Lee, Minchul;Kim, Hea-Jin
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.183-203
    • /
    • 2018
  • News articles are the most suitable medium for examining the events occurring at home and abroad. Especially, as the development of information and communication technology has brought various kinds of online news media, the news about the events occurring in society has increased greatly. So automatically summarizing key events from massive amounts of news data will help users to look at many of the events at a glance. In addition, if we build and provide an event network based on the relevance of events, it will be able to greatly help the reader in understanding the current events. In this study, we propose a method for extracting event networks from large news text data. To this end, we first collected Korean political and social articles from March 2016 to March 2017, and integrated the synonyms by leaving only meaningful words through preprocessing using NPMI and Word2Vec. Latent Dirichlet allocation (LDA) topic modeling was used to calculate the subject distribution by date and to find the peak of the subject distribution and to detect the event. A total of 32 topics were extracted from the topic modeling, and the point of occurrence of the event was deduced by looking at the point at which each subject distribution surged. As a result, a total of 85 events were detected, but the final 16 events were filtered and presented using the Gaussian smoothing technique. We also calculated the relevance score between events detected to construct the event network. Using the cosine coefficient between the co-occurred events, we calculated the relevance between the events and connected the events to construct the event network. Finally, we set up the event network by setting each event to each vertex and the relevance score between events to the vertices connecting the vertices. The event network constructed in our methods helped us to sort out major events in the political and social fields in Korea that occurred in the last one year in chronological order and at the same time identify which events are related to certain events. Our approach differs from existing event detection methods in that LDA topic modeling makes it possible to easily analyze large amounts of data and to identify the relevance of events that were difficult to detect in existing event detection. We applied various text mining techniques and Word2vec technique in the text preprocessing to improve the accuracy of the extraction of proper nouns and synthetic nouns, which have been difficult in analyzing existing Korean texts, can be found. In this study, the detection and network configuration techniques of the event have the following advantages in practical application. First, LDA topic modeling, which is unsupervised learning, can easily analyze subject and topic words and distribution from huge amount of data. Also, by using the date information of the collected news articles, it is possible to express the distribution by topic in a time series. Second, we can find out the connection of events in the form of present and summarized form by calculating relevance score and constructing event network by using simultaneous occurrence of topics that are difficult to grasp in existing event detection. It can be seen from the fact that the inter-event relevance-based event network proposed in this study was actually constructed in order of occurrence time. It is also possible to identify what happened as a starting point for a series of events through the event network. The limitation of this study is that the characteristics of LDA topic modeling have different results according to the initial parameters and the number of subjects, and the subject and event name of the analysis result should be given by the subjective judgment of the researcher. Also, since each topic is assumed to be exclusive and independent, it does not take into account the relevance between themes. Subsequent studies need to calculate the relevance between events that are not covered in this study or those that belong to the same subject.

Keyword networks in RJCC research - A co-word analysis and clustering - (RJCC 연구 키워드 네트워크 - 동시출현단어분석과 군집분석 -)

  • Seo, Hyun-Jin;Choi, Yeong-Hyeon;Oh, Seung-Taek;Lee, Kyu-Hye
    • The Research Journal of the Costume Culture
    • /
    • v.27 no.3
    • /
    • pp.193-205
    • /
    • 2019
  • A trend analysis of research articles in a field of knowledge is significant because it can help in finding out the structural characteristics of the field and the future direction of research through observing change in a time series. We identified the structural characteristics and trends in text data (keywords) gathered from research articles which in itself is an important task in various research areas. The titles and keywords were crawled from research articles published from 2016 to 2018 in the Research Journal of the Costume Culture (RJCC), one of the representative Korean journal in the field of clothing and textile. After we extracted data comprising English titles and keywords from 195 published articles, we transformed it into a 1-mode matrix. We used measures from network analysis (i.e., link, strength, and degree centrality) for evaluating meaningful patterns and trends in the research on clothing and textile. NodeXL was used for visualizing the semantic network. This study observed change in the clothing and textile research trend. In addition to covering the core areas of the field, the subjects of research have been diversifying with every passing year and have evolved onto a developmental direction. The most studied area in articles published by the RJCC was fashion retailing/consumer psychology while aesthetic/historic and fashion industry/policy studies were covered to a more limited extent. We observed that most of the studies reflecting the identity of RJCC share subject keywords to a significant extent.

The Study on Recent Research Trend in Korean Tourism Using Keyword Network Analysis (키워드 네트워크를 이용한 국내 관광연구의 최근 연구동향 분석)

  • Kim, Min Sun;Um, Hyemi
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.17 no.9
    • /
    • pp.68-73
    • /
    • 2016
  • This study was conducted to identify trends and knowledge structures associated with recent trends in Korean tourism from 2010 to 2015 using keyword data. To accomplish this, we constructed a network using keywords extracted from KCI journals. We then made a matrix describing the relationships between rows as papers and columns as keywords. A keyword network showed the connectivity of papers that have included one or more of the same keywords. Major keywords were then extracted using the cosine similarity between co-occurring keywords and components were analyzed to understand research trends and knowledge structure. The results revealed that subjects of tourism research have changed rapidly and variously. A few topics related to 'organization-employee' were major trends for several years, but intrinsic and extrinsic factors have been further subdivided and employees of specific fields have been targeted as subjects of research. Component analysis is useful for analyzing concrete research topics and the relationships between them. The results of this study will be useful for researchers attempting to identify new topics.

An Investigation on Characteristics and Intellectual Structure of Sociology by Analyzing Cited Data (사회학 분야의 연구데이터 특성과 지적구조 규명에 관한 연구)

  • Choi, Hyung Wook;Chung, EunKyung
    • Journal of the Korean Society for information Management
    • /
    • v.34 no.3
    • /
    • pp.109-124
    • /
    • 2017
  • Through a wide variety of disciplines, practices on data access and re-use have been increased recently. In fact, there has been an emerging phenomenon that researchers tend to use the data sets produced by other researchers and give scholarly credit as citation. With respect to this practice, in 2012, Thomson Reuters launched Data Citation Index (DCI). With the DCI, citation to research data published by researchers are collected and analyzed in a similar way for citation to journal articles. The purpose of this study is to identify the characteristics and intellectual structure of sociology field based on research data, which is one of actively data-citing fields. To accomplish this purpose, two data sets were collected and analyzed. First, from DCI, a total of 8,365 data were collected in the field of sociology. Second, a total of 12,132 data were collected from Web of Science with a topic search with 'Sociology'. As a result of the co-word analysis of author provided-keywords for both data sets, the intellectual structure of research data-based sociology was composed of two areas and 15 clusters and that of article-based sociology was composed with three areas and 17 clusters. More importantly, medical science area was found to be actively studied in research data-based sociology and public health and psychology are identified to be central areas from data citation.

Time Series Analysis of Intellectual Structure and Research Trend Changes in the Field of Library and Information Science: 2003 to 2017 (문헌정보학 분야의 지적구조 및 연구 동향 변화에 대한 시계열 분석: 2003년부터 2017년까지)

  • Choi, Hyung Wook;Choi, Ye-Jin;Nam, So-Yeon
    • Journal of the Korean Society for information Management
    • /
    • v.35 no.2
    • /
    • pp.89-114
    • /
    • 2018
  • Research on changes in research trends in academic disciplines is a method that enables observation of not only the detailed research subject and structure of the field but also the state of change in the flow of time. Therefore, in this study, in order to observe the changes of research trend in library and information science field in Korea, co-word analysis was conducted with Korean author keywords from three types of journals which were listed in the Korea Citation Index(KCI) and have top citation impact factor were selected. For the time series analysis, the 15-year research period was accumulated in 5-years units, and divided into 2003~2007, 2003~2012, and 2003~2017. The keywords which limited to the frequency of appearance 10 or more, respectively, were analyzed and visualized. As a result of the analysis, during the period from 2003 to 2007, the intellectual structure composed with 25 keywords and 8 areas was confirmed, and during the period from 2003 to 2012, the structure composed by 3 areas 17 sub-areas with 76 keywords was confirmed. Also, the intellectual structure during the period from 2003 to 2017 was crowded into 6 areas 32 consisting of a total of 132 keywords. As a result of comprehensive period analysis, in the field of library and information science in Korea, over the past 15 years, new keywords have been added for each period, and detailed topics have also been subdivided and gradually segmented and expanded.

An Analysis of Changes in Social Issues Related to Patient Safety Using Topic Modeling and Word Co-occurrence Analysis (토픽 모델링과 동시출현 단어 분석을 활용한 환자안전 관련 사회적 이슈의 변화)

  • Kim, Nari;Lee, Nam-Ju
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.1
    • /
    • pp.92-104
    • /
    • 2021
  • This study aims to analyze online news articles to identify social issues related to patient safety and compare the changes in these issues before and after the implementation of the Patient Safety Act. This study performed text mining through the R program, wherein 7,600 online news articles were collected from January 1, 2010, to March 5, 2020, and examined using keyword analysis, topic modeling, and word co-occurrence network analysis. A total of 2,609 keywords were categorized into 8 topics: "medical practice", "medical personnel", "infection and facilities", "comprehensive nursing service", "medicine and medical supplies", "system development and establishment for improvement", "Patient Safety Act" and "healthcare accreditation". The study revealed that keywords such as "patient safety awareness", "infection control" and "healthcare accreditation" appeared before the implementation of the Patient Safety Act. Meanwhile, keywords such as "patient safety culture". and "administration and injection" appeared after the act's implementation with improved ranking of importance pertaining to nursing-related terminology. Interest in patient safety has increased in the medical community as well as among the public. In particular, nursing plays an important role in improving patient safety. Therefore, the recognition of patient safety as a core competency of nursing and the persistent education of the public are vital and inevitable.

An Informetric Analysis of Topics in University's General Education (대학 교양교육 주제영역의 계량적 분석연구)

  • Choi, Sanghee
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.26 no.4
    • /
    • pp.245-262
    • /
    • 2015
  • As the topics of general education in universities become more diverse, it is not an easy task to identify the topics of general education courses. This study aims to identify and visualize the topics of A university's general education courses using informetric analysis methods. 214 syllabi were collected and titles, course introduction, goals, and weekly plans were analyzed. 278 topic words were extracted from the data set and grouped into 8 clusters. In the network analysis, topic clusters were divided into two areas, personal and social. Personal area has 14 sub-topic clusters and social area has 11 sub-topic clusters. In personal area, 'language', 'science', and 'personality' were major topic clusters. In social area, 'multi-culture' cluster was the core cluster with connected to four other clusters. The topic network generated in this study can be used for the university and the university library to enhance general education or to develop collections for general education.

Keyword Network Analysis about the Trends of Social Welfare Researches - focused on the papers of KJSW during 1979~2015 - (사회복지학 연구동향에 관한 키워드 네트워크 분석 - 「한국사회복지학」 게재논문(1979-2015)을 중심으로 -)

  • Kam, Jeong Ki;Kam, Mi Ah;Park, Mi Hee
    • Korean Journal of Social Welfare
    • /
    • v.68 no.2
    • /
    • pp.185-211
    • /
    • 2016
  • This study analyzes key word networks of the papers which are published at Korean Journal of Social Welfare issued by Korean Academy of Social Welfare from 1979 to 2015. It aims at investigating the trends of social welfare researches in Korea by dividing the given period into two: 1979-2000 and 2001-2015. It shows the trends in three ways: methodologies, subjects, and intellectual structures. In order to identify intellectual structure, it calculate centrality indices basing on co-appearance frequency of key words. It also derives some values which explain relationship structure of key words by using pathfinder algorithm, and finally visualizes the intellectual structures by using the NodeXL program. Some implications of the findings of these analyses are discussed in the end.

  • PDF

Analysis on Topics of Digital Preservation Researches and Courses (디지털 보존 관련 학술연구 및 교과 주제분석)

  • Jeong, Uiyeon;Choi, Sanghee
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.53 no.3
    • /
    • pp.25-43
    • /
    • 2019
  • Recently there has been a growing interest in digital preservation and digital curation with rapid increase of digital resource. This study aims to investigate the research topics and the course topics related digital preservation and digital curation. The course information is collected from the curricular of library and information science departments and archival science departments in leading countries such as US, England, Ireland, Canada and New Zealand. Title keyword profiling and network analysis were adapted to discover core research and education areas. The key topics in the abstracts of research papers and the contents of the course were also illustrated by these methods. In the research analysis, archival system is the biggest area of researches related digital preservation and digital curation. Courser analysis shows digital curation education and process is the important area of education. As a result of content analysis, plan and strategy is a notable topic of research and record management process is a major topic of courses for digital preservation and digital curation. In addition, format of digital resource is an important topic for research and courses.

Memory Organization for a Fuzzy Controller.

  • Jee, K.D.S.;Poluzzi, R.;Russo, B.
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 1993.06a
    • /
    • pp.1041-1043
    • /
    • 1993
  • Fuzzy logic based Control Theory has gained much interest in the industrial world, thanks to its ability to formalize and solve in a very natural way many problems that are very difficult to quantify at an analytical level. This paper shows a solution for treating membership function inside hardware circuits. The proposed hardware structure optimizes the memoried size by using particular form of the vectorial representation. The process of memorizing fuzzy sets, i.e. their membership function, has always been one of the more problematic issues for the hardware implementation, due to the quite large memory space that is needed. To simplify such an implementation, it is commonly [1,2,8,9,10,11] used to limit the membership functions either to those having triangular or trapezoidal shape, or pre-definite shape. These kinds of functions are able to cover a large spectrum of applications with a limited usage of memory, since they can be memorized by specifying very few parameters ( ight, base, critical points, etc.). This however results in a loss of computational power due to computation on the medium points. A solution to this problem is obtained by discretizing the universe of discourse U, i.e. by fixing a finite number of points and memorizing the value of the membership functions on such points [3,10,14,15]. Such a solution provides a satisfying computational speed, a very high precision of definitions and gives the users the opportunity to choose membership functions of any shape. However, a significant memory waste can as well be registered. It is indeed possible that for each of the given fuzzy sets many elements of the universe of discourse have a membership value equal to zero. It has also been noticed that almost in all cases common points among fuzzy sets, i.e. points with non null membership values are very few. More specifically, in many applications, for each element u of U, there exists at most three fuzzy sets for which the membership value is ot null [3,5,6,7,12,13]. Our proposal is based on such hypotheses. Moreover, we use a technique that even though it does not restrict the shapes of membership functions, it reduces strongly the computational time for the membership values and optimizes the function memorization. In figure 1 it is represented a term set whose characteristics are common for fuzzy controllers and to which we will refer in the following. The above term set has a universe of discourse with 128 elements (so to have a good resolution), 8 fuzzy sets that describe the term set, 32 levels of discretization for the membership values. Clearly, the number of bits necessary for the given specifications are 5 for 32 truth levels, 3 for 8 membership functions and 7 for 128 levels of resolution. The memory depth is given by the dimension of the universe of the discourse (128 in our case) and it will be represented by the memory rows. The length of a world of memory is defined by: Length = nem (dm(m)+dm(fm) Where: fm is the maximum number of non null values in every element of the universe of the discourse, dm(m) is the dimension of the values of the membership function m, dm(fm) is the dimension of the word to represent the index of the highest membership function. In our case then Length=24. The memory dimension is therefore 128*24 bits. If we had chosen to memorize all values of the membership functions we would have needed to memorize on each memory row the membership value of each element. Fuzzy sets word dimension is 8*5 bits. Therefore, the dimension of the memory would have been 128*40 bits. Coherently with our hypothesis, in fig. 1 each element of universe of the discourse has a non null membership value on at most three fuzzy sets. Focusing on the elements 32,64,96 of the universe of discourse, they will be memorized as follows: The computation of the rule weights is done by comparing those bits that represent the index of the membership function, with the word of the program memor . The output bus of the Program Memory (μCOD), is given as input a comparator (Combinatory Net). If the index is equal to the bus value then one of the non null weight derives from the rule and it is produced as output, otherwise the output is zero (fig. 2). It is clear, that the memory dimension of the antecedent is in this way reduced since only non null values are memorized. Moreover, the time performance of the system is equivalent to the performance of a system using vectorial memorization of all weights. The dimensioning of the word is influenced by some parameters of the input variable. The most important parameter is the maximum number membership functions (nfm) having a non null value in each element of the universe of discourse. From our study in the field of fuzzy system, we see that typically nfm 3 and there are at most 16 membership function. At any rate, such a value can be increased up to the physical dimensional limit of the antecedent memory. A less important role n the optimization process of the word dimension is played by the number of membership functions defined for each linguistic term. The table below shows the request word dimension as a function of such parameters and compares our proposed method with the method of vectorial memorization[10]. Summing up, the characteristics of our method are: Users are not restricted to membership functions with specific shapes. The number of the fuzzy sets and the resolution of the vertical axis have a very small influence in increasing memory space. Weight computations are done by combinatorial network and therefore the time performance of the system is equivalent to the one of the vectorial method. The number of non null membership values on any element of the universe of discourse is limited. Such a constraint is usually non very restrictive since many controllers obtain a good precision with only three non null weights. The method here briefly described has been adopted by our group in the design of an optimized version of the coprocessor described in [10].

  • PDF