• Title/Summary/Keyword: 토픽 추출

Search Result 209, Processing Time 0.034 seconds

A Study on Ontology and Topic Modeling-based Multi-dimensional Knowledge Map Services (온톨로지와 토픽모델링 기반 다차원 연계 지식맵 서비스 연구)

  • Jeong, Hanjo
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.4
    • /
    • pp.79-92
    • /
    • 2015
  • Knowledge map is widely used to represent knowledge in many domains. This paper presents a method of integrating the national R&D data and assists of users to navigate the integrated data via using a knowledge map service. The knowledge map service is built by using a lightweight ontology and a topic modeling method. The national R&D data is integrated with the research project as its center, i.e., the other R&D data such as research papers, patents, and reports are connected with the research project as its outputs. The lightweight ontology is used to represent the simple relationships between the integrated data such as project-outputs relationships, document-author relationships, and document-topic relationships. Knowledge map enables us to infer further relationships such as co-author and co-topic relationships. To extract the relationships between the integrated data, a Relational Data-to-Triples transformer is implemented. Also, a topic modeling approach is introduced to extract the document-topic relationships. A triple store is used to manage and process the ontology data while preserving the network characteristics of knowledge map service. Knowledge map can be divided into two types: one is a knowledge map used in the area of knowledge management to store, manage and process the organizations' data as knowledge, the other is a knowledge map for analyzing and representing knowledge extracted from the science & technology documents. This research focuses on the latter one. In this research, a knowledge map service is introduced for integrating the national R&D data obtained from National Digital Science Library (NDSL) and National Science & Technology Information Service (NTIS), which are two major repository and service of national R&D data servicing in Korea. A lightweight ontology is used to design and build a knowledge map. Using the lightweight ontology enables us to represent and process knowledge as a simple network and it fits in with the knowledge navigation and visualization characteristics of the knowledge map. The lightweight ontology is used to represent the entities and their relationships in the knowledge maps, and an ontology repository is created to store and process the ontology. In the ontologies, researchers are implicitly connected by the national R&D data as the author relationships and the performer relationships. A knowledge map for displaying researchers' network is created, and the researchers' network is created by the co-authoring relationships of the national R&D documents and the co-participation relationships of the national R&D projects. To sum up, a knowledge map-service system based on topic modeling and ontology is introduced for processing knowledge about the national R&D data such as research projects, papers, patent, project reports, and Global Trends Briefing (GTB) data. The system has goals 1) to integrate the national R&D data obtained from NDSL and NTIS, 2) to provide a semantic & topic based information search on the integrated data, and 3) to provide a knowledge map services based on the semantic analysis and knowledge processing. The S&T information such as research papers, research reports, patents and GTB are daily updated from NDSL, and the R&D projects information including their participants and output information are updated from the NTIS. The S&T information and the national R&D information are obtained and integrated to the integrated database. Knowledge base is constructed by transforming the relational data into triples referencing R&D ontology. In addition, a topic modeling method is employed to extract the relationships between the S&T documents and topic keyword/s representing the documents. The topic modeling approach enables us to extract the relationships and topic keyword/s based on the semantics, not based on the simple keyword/s. Lastly, we show an experiment on the construction of the integrated knowledge base using the lightweight ontology and topic modeling, and the knowledge map services created based on the knowledge base are also introduced.

Investigations on Techniques and Applications of Text Analytics (텍스트 분석 기술 및 활용 동향)

  • Kim, Namgyu;Lee, Donghoon;Choi, Hochang;Wong, William Xiu Shun
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.42 no.2
    • /
    • pp.471-492
    • /
    • 2017
  • The demand and interest in big data analytics are increasing rapidly. The concepts around big data include not only existing structured data, but also various kinds of unstructured data such as text, images, videos, and logs. Among the various types of unstructured data, text data have gained particular attention because it is the most representative method to describe and deliver information. Text analysis is generally performed in the following order: document collection, parsing and filtering, structuring, frequency analysis, and similarity analysis. The results of the analysis can be displayed through word cloud, word network, topic modeling, document classification, and semantic analysis. Notably, there is an increasing demand to identify trending topics from the rapidly increasing text data generated through various social media. Thus, research on and applications of topic modeling have been actively carried out in various fields since topic modeling is able to extract the core topics from a huge amount of unstructured text documents and provide the document groups for each different topic. In this paper, we review the major techniques and research trends of text analysis. Further, we also introduce some cases of applications that solve the problems in various fields by using topic modeling.

Construction of Record Retrieval System based on Topic Map (토픽맵 기반의 기록정보 검색시스템 구축에 관한 연구)

  • Kwon, Chang-Ho
    • The Korean Journal of Archival Studies
    • /
    • no.19
    • /
    • pp.57-102
    • /
    • 2009
  • Recently, distribution of record via web and coefficient of utilization are increase. so, Archival information service using website becomes essential part of record center. The main point of archival information service by website is making record information retrieval easy. It has need of matching user's request and representation of record resources correctly to making archival information retrieval easy. Archivist and record manager have used various information representation tools from taxonomy to recent thesaurus, still, the accuracy of information retrieval has not solved. This study constructed record retrieval system based on Topic Map by modeling record resources which focusing on description metadata of the records to improve this problem. The target user of the system is general web users and its range is limited to the president related sources in the National Archives Portal Service. The procedure is as follows; 1) Design an ontology model for archival information service based on topic map which focusing on description metadata of the records. 2) Buildpractical record retrieval system with topic map that received information source list, which extracted from the National Archives Portal Service, by editor. 3) Check and assess features of record retrieval system based on topic map through user interface. Through the practice, relevance navigation to other record sources by semantic inference of description metadata is confirmed. And also, records could be built up as knowledge with result of scattered archival sources.

Research Trends on Emotional Labor in Korea using text mining (텍스트마이닝을 활용한 감정노동 연구 동향 분석)

  • Cho, Kyoung-Won;Han, Na-Young
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.26 no.6
    • /
    • pp.119-133
    • /
    • 2021
  • Research has been conducted in many fields to identify research trends using text mining, but in the field of emotional labor, no research has been conducted using text mining to identify research trends. This study uses text mining to deeply analyze 1,465 papers at the Korea Citation Index (KCI) from 2004 to 2019 containing the subject word 'emotional labor' to understand the trend of emotional labor researches. Topics were extracted by LDA analysis, and IDM analysis was performed to confirm the proportion and similarity of the topics. Through these methods, an integrated analysis of topics was conducted considering the usefulness of topics with high similarity. The research topics are divided into 11 categories in descending order: stress of emotional labor (12.2%), emotional labor and social support (12.0%), customer service workers' emotional labor (10.9%), emotional labor and resilience (10.2%), emotional labor strategy (9.2%), call center counselor's emotional labor (9.1%), results of emotional labor (9.0%), emotional labor and job exhaustion (7.9%), emotional intelligence (7.1%), preliminary care service workers' emotional labor (6.6%), emotional labor and organizational culture (5.9%). Through topic modeling and trend analysis, the research trend of emotional labor and the academic progress are analyzed to present the direction of emotional labor research, and it is expected that a practical strategy for emotional labor can be established.

Extracting Korean-English Parallel Sentences from Wikipedia (위키피디아로부터 한국어-영어 병렬 문장 추출)

  • Kim, Sung-Hyun;Yang, Seon;Ko, Youngjoong
    • Journal of KIISE:Software and Applications
    • /
    • v.41 no.8
    • /
    • pp.580-585
    • /
    • 2014
  • This paper conducts a variety of experiments for "the extraction of Korean parallel sentences using Wikipedia data". We refer to various methods that were previously proposed for other languages. We use two approaches. The first one is to use translation probabilities that are extracted from the existing resources such as Sejong parallel corpus, and the second one is to use dictionaries such as Wiki dictionary consisting of Wikipedia titles and MRDs (machine readable dictionaries). Experimental results show that we obtained a significant improvement in system using Wikipedia data in comparison to one using only the existing resources. We finally achieve an outstanding performance, an F1-score of 57.6%. We additionally conduct experiments using a topic model. Although this experiment shows a relatively lower performance, an F1-score of 51.6%, it is expected to be worthy of further studies.

Technology Development Strategy of Piggyback Transportation System Using Topic Modeling Based on LDA Algorithm

  • Jun, Sung-Chan;Han, Seong-Ho;Kim, Sang-Baek
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.12
    • /
    • pp.261-270
    • /
    • 2020
  • In this study, we identify promising technologies for Piggyback transportation system by analyzing the relevant patent information. In order for this, we first develop the patent database by extracting relevant technology keywords from the pioneering research papers for the Piggyback flactcar system. We then employed textmining to identify the frequently referred words from the patent database, and using these words, we applied the LDA (Latent Dirichlet Allocation) algorithm in order to identify "topics" that are corresponding to "key" technologies for the Piggyback system. Finally, we employ the ARIMA model to forecast the trends of these "key" technologies for technology forecasting, and identify the promising technologies for the Piggyback system. with keyword search method the patent analysis. The results show that data-driven integrated management system, operation planning system and special cargo (especially fluid and gas) handling/storage technologies are identified to be the "key" promising technolgies for the future of the Piggyback system, and data reception/analysis techniques must be developed in order to improve the system performance. The proposed procedure and analysis method provides useful insights to develop the R&D strategy and the technology roadmap for the Piggyback system.

Analysis of the Knowledge Structure of Research related to Reality Shock Experienced by New Graduate Nurses using Text Network Analysis (텍스트네트워크분석을 활용한 신규간호사가 경험하는 현실충격 관련 연구의 지식구조 분석)

  • Heejang Yun
    • The Journal of the Convergence on Culture Technology
    • /
    • v.9 no.1
    • /
    • pp.463-469
    • /
    • 2023
  • The aim of this study is to provide basic data that can contribute to improving successful clinical adaptation and reducing turnover of new graduate nurses by analyzing research related to reality shock experienced by new graduate nurses using text network analysis. The topics of reality shock experienced by new graduate nurses were extracted from 115 papers published in domestic and foreign journals from January 2002 to December 2021. Articles were retrieved from 6 databases (Korean DB: DBpia, KISS, RISS /International DB: Web of science, Springer, Scopus). Keywords were extracted from the abstract and organized using semantic morphemes. Network analysis and topic modeling for subject knowledge structure analysis were performed using NetMiner 4.5.0 program. The core keywords included 'new graduate nurses', 'reality shock', 'transition', 'student nurse', 'experience', 'practice', 'work environment', 'role', 'care' and 'education'. In recent articles on reality shock experienced by new graduate nurses, three major topics were extracted by LDA (Latent Dirichlet Allocation) techniques: 'turnover', 'work environment', 'experience of transition'. Based on this research, the necessity of interventional research that can effectively reduce the reality shock experienced by new graduate nurses and successfully help clinical adaptation is suggested.

Efficient Topic Modeling by Mapping Global and Local Topics (전역 토픽의 지역 매핑을 통한 효율적 토픽 모델링 방안)

  • Choi, Hochang;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.69-94
    • /
    • 2017
  • Recently, increase of demand for big data analysis has been driving the vigorous development of related technologies and tools. In addition, development of IT and increased penetration rate of smart devices are producing a large amount of data. According to this phenomenon, data analysis technology is rapidly becoming popular. Also, attempts to acquire insights through data analysis have been continuously increasing. It means that the big data analysis will be more important in various industries for the foreseeable future. Big data analysis is generally performed by a small number of experts and delivered to each demander of analysis. However, increase of interest about big data analysis arouses activation of computer programming education and development of many programs for data analysis. Accordingly, the entry barriers of big data analysis are gradually lowering and data analysis technology being spread out. As the result, big data analysis is expected to be performed by demanders of analysis themselves. Along with this, interest about various unstructured data is continually increasing. Especially, a lot of attention is focused on using text data. Emergence of new platforms and techniques using the web bring about mass production of text data and active attempt to analyze text data. Furthermore, result of text analysis has been utilized in various fields. Text mining is a concept that embraces various theories and techniques for text analysis. Many text mining techniques are utilized in this field for various research purposes, topic modeling is one of the most widely used and studied. Topic modeling is a technique that extracts the major issues from a lot of documents, identifies the documents that correspond to each issue and provides identified documents as a cluster. It is evaluated as a very useful technique in that reflect the semantic elements of the document. Traditional topic modeling is based on the distribution of key terms across the entire document. Thus, it is essential to analyze the entire document at once to identify topic of each document. This condition causes a long time in analysis process when topic modeling is applied to a lot of documents. In addition, it has a scalability problem that is an exponential increase in the processing time with the increase of analysis objects. This problem is particularly noticeable when the documents are distributed across multiple systems or regions. To overcome these problems, divide and conquer approach can be applied to topic modeling. It means dividing a large number of documents into sub-units and deriving topics through repetition of topic modeling to each unit. This method can be used for topic modeling on a large number of documents with limited system resources, and can improve processing speed of topic modeling. It also can significantly reduce analysis time and cost through ability to analyze documents in each location or place without combining analysis object documents. However, despite many advantages, this method has two major problems. First, the relationship between local topics derived from each unit and global topics derived from entire document is unclear. It means that in each document, local topics can be identified, but global topics cannot be identified. Second, a method for measuring the accuracy of the proposed methodology should be established. That is to say, assuming that global topic is ideal answer, the difference in a local topic on a global topic needs to be measured. By those difficulties, the study in this method is not performed sufficiently, compare with other studies dealing with topic modeling. In this paper, we propose a topic modeling approach to solve the above two problems. First of all, we divide the entire document cluster(Global set) into sub-clusters(Local set), and generate the reduced entire document cluster(RGS, Reduced global set) that consist of delegated documents extracted from each local set. We try to solve the first problem by mapping RGS topics and local topics. Along with this, we verify the accuracy of the proposed methodology by detecting documents, whether to be discerned as the same topic at result of global and local set. Using 24,000 news articles, we conduct experiments to evaluate practical applicability of the proposed methodology. In addition, through additional experiment, we confirmed that the proposed methodology can provide similar results to the entire topic modeling. We also proposed a reasonable method for comparing the result of both methods.

Topic Modeling of News Article about International Construction Market Using Latent Dirichlet Allocation (Latent Dirichlet Allocation 기법을 활용한 해외건설시장 뉴스기사의 토픽 모델링(Topic Modeling))

  • Moon, Seonghyeon;Chung, Sehwan;Chi, Seokho
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.38 no.4
    • /
    • pp.595-599
    • /
    • 2018
  • Sufficient understanding of oversea construction market status is crucial to get profitability in the international construction project. Plenty of researchers have been considering the news article as a fine data source for figuring out the market condition, since the data includes market information such as political, economic, and social issue. Since the text data exists in unstructured format with huge size, various text-mining techniques were studied to reduce the unnecessary manpower, time, and cost to summarize the data. However, there are some limitations to extract the needed information from the news article because of the existence of various topics in the data. This research is aimed to overcome the problems and contribute to summarization of market status by performing topic modeling with Latent Dirichlet Allocation. With assuming that 10 topics existed in the corpus, the topics included projects for user convenience (topic-2), private supports to solve poverty problems in Africa (topic-4), and so on. By grouping the topics in the news articles, the results could improve extracting useful information and summarizing the market status.

Analyzing the Trend of False·Exaggerated Advertisement Keywords Using Text-mining Methodology (1990-2019) (텍스트마이닝 기법을 활용한 허위·과장광고 관련 기사의 트렌드 분석(1990-2019))

  • Kim, Do-Hee;Kim, Min-Jeong
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.4
    • /
    • pp.38-49
    • /
    • 2021
  • This study analyzed the trend of the term 'false and exaggerated advertisement' in 5,141 newspaper articles from 1990 to 2019 using text mining methodology. First of all, we identified the most frequent keywords of false and exaggerated advertisements through frequency analysis for all newspaper articles, and understood the context between the extracted keywords. Next, to examine how false and exaggerated advertisements have changed, the frequency analysis was performed by separating articles by 10 years, and the tendency of the keyword that became an issue was identified by comparing the number of academic papers on the subject of the highest keywords of each year. Finally, we identified trends in false and exaggerated advertisements based on the detailed keywords in the topic using the topic modeling. In our results, it was confirmed that the topic that became an issue at a specific time was extracted as the frequent keywords, and the keyword trends by period changed in connection with social and environmental factors. This study is meaningful in helping consumers spend wisely by cultivating background knowledge about unfair advertising. Furthermore, it is expected that the core keyword extraction will provide the true purpose of advertising and deliver its implications to companies and related employees who commit misconduct.