Search | Korea Science

Study on the Topic Selection of Web Documents (웹 문서의 토픽 선정 방법에 관한 연구)

Kong, Hyun-Jang;Hwang, Myung-Gwon;Kim, Pan-Koo
- Proceedings of the Korean Information Science Society Conference
- /
- 2006.10b
- /
- pp.148-151
- /
- 2006
웹 문서의 수가 기하급수적으로 늘어나는 현 시점에서 문서의 효율적인 관리을 위한 문서 클러스터링 방법은 현재 가장 요구되는 기술이다. 지금까지 문서 클러스터링의 방법 연구에서는 TF-Idf 측정값을 이용한 문서분류, Title 기반의 문서분류등과 같은 다양한 시도가 있었다. 이러한 문서 클러스터링 방법에서는 문서의 내용에 치중하거나 문서 분류를 위한 정확한 기준이 없어, 효율적인 문서의 클러스터링과 검색을 지원하지 못하였다. 그리하여, 본 연구에서는 새롭게 토픽 선정 알고리즘을 제안하고, 토픽 선정 알고리즘에 의해 결정된 토픽에 기반하여 문서 검색을 수행함으로써, 문서검색의 성능을 높일 수 있었다.
PDF

Reranking Search Results for Mathematical Equation Retrieval Using Topic Models (토픽 모델을 이용한 수학식 검색 결과 재랭킹)

Yang, Seon;Ko, Youngjoong
- Annual Conference on Human and Language Technology
- /
- 2013.10a
- /
- pp.77-81
- /
- 2013
본 논문은 두 가지 주제에 대해 연구한다. 첫 번째는 수학식 검색에 대한 것이다. 웹에는 양질의 수학식 데이터가 마크업 언어 형태로 저장되어 있으며 이를 활용하기 위한 연구들이 활발히 진행되고 있다. 본 연구에서는 MathML (Mathematical Markup Language)로 저장된 수학식 데이터를 일반 질의어를 이용하여 검색한다. 두 번째 주제는 토픽 모델(topic model)로 검색 성능을 향상시키는 방법에 대한 것이다. 먼저 수학식 데이터를 일반 자연어 문장으로 변환한 후 Indri 시스템을 이용하여 검색을 수행하고, 토픽 모델을 이용하여 미리 산출된 스코어를 적용하여 검색 순위를 재랭킹한 결과, MRR 기준 평균 5%의 성능을 향상시킬 수 있었다.
PDF

토픽모델링을 활용한 부산항 항만안전성 이슈 동향에 관한 연구

이정민;하도연;김율성
- Proceedings of the Korean Institute of Navigation and Port Research Conference
- /
- 2023.11a
- /
- pp.66-67
- /
- 2023
최근 들어, 현대사회는 예측이 불가능한 다양한 위험성들이 존재하여 글로벌 의존도가 높은 항만물류산업의 위험부담이 증가하고 있다. 이에 본 연구에서는 항만산업의 안전성에 영향을 미치는 요인을 알아보기 위해 과거부터 현재까지 국내 항만 안전성에 영향을 미친 이슈들을 시계열적으로 살펴보고자 하였다. 이를 위하여 국내를 대표하는 부산항의 항만 안전성과 관련된 뉴스 기사 텍스트 데이터를 활용하여 LDA 토픽모델링 분석을 진행하여 부산항 항만안전 주요 이슈들의 동향을 살펴보고자 하였다.
PDF

Topic Analysis of the United States' Historical Records about Korea for Modeling of Topic Map: Focused on Major Archives in Korea (토픽맵 모델링을 위한 한국 관련 미국기록물의 주제분석 연구: 국내 주요 소장기관 중심으로)

Kwon, Min Jeong;Choi, Sanghee
- Journal of Korean Society of Archives and Records Management
- /
- v.22 no.2
- /
- pp.95-116
- /
- 2022
This study selected four institutions, namely the National Archives of Korea, the National Library of Korea, the National Institute of Korean History, and the Institute for Military History, that have collected the foreign records about Korea, providing web services for users. A total of 163,874 records were collected from these institutions to analyze the topics of the records, which resulted in a proposed topic map with 7 facets as top levels. This study suggests an integrated and visualized tool using a topic map to improve the searching service of foreign records about Korea from Korean institutions, especially the US records related to Korea.
https://doi.org/10.14404/JKSARM.2022.22.2.095 인용 PDF KSCI

Topic modeling for automatic classification of learner question and answer in teaching-learning support system (교수-학습지원시스템에서 학습자 질의응답 자동분류를 위한 토픽 모델링)

Kim, Kyungrog;Song, Hye jin;Moon, Nammee
- Journal of Digital Contents Society
- /
- v.18 no.2
- /
- pp.339-346
- /
- 2017
There is increasing interest in text analysis based on unstructured data such as articles and comments, questions and answers. This is because they can be used to identify, evaluate, predict, and recommend features from unstructured text data, which is the opinion of people. The same holds true for TEL, where the MOOC service has evolved to automate debating, questioning and answering services based on the teaching-learning support system in order to generate question topics and to automatically classify the topics relevant to new questions based on question and answer data accumulated in the system. Therefore, in this study, we propose topic modeling using LDA to automatically classify new query topics. The proposed method enables the generation of a dictionary of question topics and the automatic classification of topics relevant to new questions. Experimentation showed high automatic classification of over 0.7 in some queries. The more new queries were included in the various topics, the better the automatic classification results.
https://doi.org/10.9728/dcs.2017.18.2.339 인용 PDF KSCI

Semantic Dependency Link Topic Model for Biomedical Acronym Disambiguation (의미적 의존 링크 토픽 모델을 이용한 생물학 약어 중의성 해소)

Kim, Seonho;Yoon, Juntae;Seo, Jungyun
- Journal of KIISE
- /
- v.41 no.9
- /
- pp.652-665
- /
- 2014
Many important terminologies in biomedical text are expressed as abbreviations or acronyms. We newly suggest a semantic link topic model based on the concepts of topic and dependency link to disambiguate biomedical abbreviations and cluster long form variants of abbreviations which refer to the same senses. This model is a generative model inspired by the latent Dirichlet allocation (LDA) topic model, in which each document is viewed as a mixture of topics, with each topic characterized by a distribution over words. Thus, words of a document are generated from a hidden topic structure of a document and the topic structure is inferred from observable word sequences of document collections. In this study, we allow two distinct word generation to incorporate semantic dependencies between words, particularly between expansions (long forms) of abbreviations and their sentential co-occurring words. Besides topic information, the semantic dependency between words is defined as a link and a new random parameter for the link presence is assigned to each word. As a result, the most probable expansions with respect to abbreviations of a given abstract are decided by word-topic distribution, document-topic distribution, and word-link distribution estimated from document collection though the semantic dependency link topic model. The abstracts retrieved from the MEDLINE Entrez interface by the query relating 22 abbreviations and their 186 expansions were used as a data set. The link topic model correctly predicted expansions of abbreviations with the accuracy of 98.30%.
https://doi.org/10.5626/JOK.2014.41.9.652 인용

Topic Modeling and Network Analysis of Peace Education and Unification Education Based on Big Data Analysis (빅데이터 분석에 기반한 평화교육과 통일교육의 토픽 모델링 및 네트워크 분석)

Kim, Byung-Man
- Journal of Convergence for Information Technology
- /
- v.12 no.3
- /
- pp.25-37
- /
- 2022
The purpose of this study is to comprehensively check trends in policies, discourses, educational directions and contents, and social issues by deriving the subjective characteristics of peace education and unification education based on big data analysis. The results of this study are as follows. First, 'peace', 'unification', 'education', 'research', 'student', 'school', 'teacher', 'target', and 'Korean Peninsula' were commonly important keywords in peace education and unification education. Second, the top topic of peace education was 'peace education and civic education', and the top topic of unification education was ' sympathy and participation in unification education'. Third, topics that show an upward trend by regime in peace education were 'World Peace and Human Rights' and 'Object and Direction of Peace Education', and 'Subject of Unification Education' as topics that showed an upward trend by regime in unification education. Fourth, in peace education, the centrality of 'peace', 'education', 'student', 'school', and 'peace education' was high, and in unification education, 'unification', 'education', 'unification', 'school', and 'teacher' were high. Based on these results, it was intended to expand the horizon of understanding peace education and unification education, and to provide meaningful implications for establishing policies and conducting follow-up studies.
https://doi.org/10.22156/CS4SMB.2022.12.03.025 인용 PDF KSCI

Big Data News Analysis in Healthcare Using Topic Modeling and Time Series Regression Analysis (토픽모델링과 시계열 회귀분석을 활용한 헬스케어 분야의 뉴스 빅데이터 분석 연구)

Eun-Jung Kim;Suk-Gwon Chang;Sang-Yong Tom Lee
- Information Systems Review
- /
- v.25 no.3
- /
- pp.163-177
- /
- 2023
This research aims to identify key initiatives and a policy approach to support the industrialization of the sector. The research collected a total of 91,873 news data points relating to healthcare between 2013 to 2022. A total of 20 topics were derived through topic modeling analysis, and as a result of time series regression analysis, 4 hot topics (Healthcare, Biopharmaceuticals, Corporate outlook·Sales, Government·Policy), 3 cold topics (Smart devices, Stocks·Investment, Urban development·Construction) derived a significant topic. The research findings will serve as an important data source for government institutions that are engaged in the formulation and implementation of Korea's policies.
https://doi.org/10.14329/isr.2023.25.3.163 인용 PDF

Analysis of Changes in Discourse of Major Media on Park Issues - Focusing on Newspaper Articles Published from 1995 to 2019 - (공원 이슈에 대한 주요 언론의 담론변화분석 - 1995년부터 2019년까지 신문 기사를 중심으로 -)

Ko, Ha-jung
- Journal of the Korean Institute of Landscape Architecture
- /
- v.49 no.5
- /
- pp.46-58
- /
- 2021
Parks became essential to people after the introduction of modern parks in Korea. Following mayoral elections by popular vote, issues surrounding parks, such as the creation of parks, have arisen and have been publicized by the media, allowing for the formation of discourse. Accordingly, this study conducted a topic analysis by collecting news articles from major media outlets in Korea that addressed issues related to parks since 1995, after the introduction of mayoral elections by popular vote, and analyzed changes over time in the discourse on parks through semantic network analysis. As a result of a Latent Dirichlet allocation topic modeling analysis, the following five topics were classified: urban park expansion (Topic 1), historical and cultural parks (Topic 2), use programs (Topic 3), zoo event (Topic 4), and conflicts in the park creation process (Topic 5). The park-related discourse addressed by the media is as follows. First, the creation process and conflicts regarding the quantitative expansion of parks are treated as the central discourse. Second, the names of parks appear as keywords every time a new park is created, and they are mentioned continuously from then on, thereby playing an important role in the formation of discourse. Third, 'residents' form discourse about the public nature of the park as the principal agent in park-related media. This study has significance in that it examines how parks are interpreted and how discourse is formed and changed by the media. It is expected that discourse on parks will be addressed from various perspectives in further research focusing on other media, such as regional and specialized magazines.
https://doi.org/10.9715/KILA.2021.49.5.046 인용 PDF KSCI

Digital Transformation: Using D.N.A.(Data, Network, AI) Keywords Generalized DMR Analysis (디지털 전환: D.N.A.(Data, Network, AI) 키워드를 활용한 토픽 모델링)

An, Sehwan;Ko, Kangwook;Kim, Youngmin
- Knowledge Management Research
- /
- v.23 no.3
- /
- pp.129-152
- /
- 2022
As a key infrastructure for digital transformation, the spread of data, network, artificial intelligence (D.N.A.) fields and the emergence of promising industries are laying the groundwork for active digital innovation throughout the economy. In this study, by applying the text mining methodology, major topics were derived by using the abstract, publication year, and research field of the study corresponding to the SCIE, SSCI, and A&HCI indexes of the WoS database as input variables. First, main keywords were identified through TF and TF-IDF analysis based on word appearance frequency, and then topic modeling was performed using g-DMR. With the advantage of the topic model that can utilize various types of variables as meta information, it was possible to properly explore the meaning beyond simply deriving a topic. According to the analysis results, topics such as business intelligence, manufacturing production systems, service value creation, telemedicine, and digital education were identified as major research topics in digital transformation. To summarize the results of topic modeling, 1) research on business intelligence has been actively conducted in all areas after COVID-19, and 2) issues such as intelligent manufacturing solutions and metaverses have emerged in the manufacturing field. It has been confirmed that the topic of production systems is receiving attention once again. Finally, 3) Although the topic itself can be viewed separately in terms of technology and service, it was found that it is undesirable to interpret it separately because a number of studies comprehensively deal with various services applied by combining the relevant technologies.
https://doi.org/10.15813/kmr.2022.23.3.007 인용 PDF KSCI

Search Result 715, Processing Time 0.025 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)