• Title/Summary/Keyword: 잠재 디리클레 할당

Search Result 43, Processing Time 0.025 seconds

An Examination of the Topics and Changes in the Research Papers Published in the Journal of Korean Elementary Science Education Using Latent Dirichlet Allocation for the Topic Modeling Analysis (잠재 디리클레 할당(LDA) 기반의 토픽모델링 분석을 통한 '초등과학교육' 학술지 연구논문의 주제 및 변화)

  • Chang, Jina;Na, Jiyeon
    • Journal of Korean Elementary Science Education
    • /
    • v.41 no.2
    • /
    • pp.356-372
    • /
    • 2022
  • This study examined the topics that have appeared in the "Journal of Korean Elementary Science Education" over the past 50 years to identify the changes that have occurred in the Korean Society of Elementary Science Education. Latent Dirichlet allocation topic modeling was applied to 1,065 English abstracts from the first issue (1983) to 2021, from which 14 main topics were extracted. The meaning of each topic was then analyzed from its keywords and documents. Subsequently, to elucidate the topic trends, the topics' increase or decrease every three years was statistically examined through linear regression analysis. Based on the results, implications for developing and supporting elementary science education research in the future were discussed.

Classifying and Characterizing the Types of Gentrified Commercial Districts Based on Sense of Place Using Big Data: Focusing on 14 Districts in Seoul (빅데이터를 활용한 젠트리피케이션 상권의 장소성 분류와 특성 분석 -서울시 14개 주요상권을 중심으로-)

  • Young-Jae Kim;In Kwon Park
    • Journal of the Korean Regional Science Association
    • /
    • v.39 no.1
    • /
    • pp.3-20
    • /
    • 2023
  • This study aims to categorize the 14 major gentrified commercial areas of Seoul and analyze their characteristics based on their sense of place. To achieve this, we conducted hierarchical cluster analysis using text data collected from Naver Blog. We divided the districts into two dimensions: "experience" and "feature" and analyzed their characteristics using LDA (Latent Dirichlet Allocation) of the text data and statistical data collected from Seoul Open Data Square. As a result, we classified the commercial districts of Seoul into 5 categories: 'theater district,' 'traditional cultural district,' 'female-beauty district,' 'exclusive restaurant and medical district,' and 'trend-leading district.' The findings of this study are expected to provide valuable insights for policy-makers to develop more efficient and suitable commercial policies.

The Trends of Eco-Friendly Textiles Using Big Data from Newspaper Articles (신문기사 빅데이터를 활용한 친환경 섬유의 추이에 관한 연구)

  • Nam Beom Cho;Choong Kwon Lee
    • Smart Media Journal
    • /
    • v.13 no.2
    • /
    • pp.95-107
    • /
    • 2024
  • The development of environmentally friendly products and services has become a trend, and the development and utilization of eco-friendly textiles with economic value is gaining attention as a new business model. Analyzing and identifying trends and developments in eco-friendly textiles can provide important information and insights for various stakeholders such as companies, governments, and consumers to help them achieve sustainable growth. For this study, we collected and analyzed data from newspaper articles mainly covering the textile and fashion sector from 2000 to June 2023. A total of 12,331 articles containing the keyword 'eco-friendly textiles' were collected, and after performing morphological analysis on the extracted data, Latent Dirichlet Allocation and Dynamic Topic Modeling analysis were performed to identify topics by year. The results of the study are expected to provide strategic guidance and insights for the sustainable development of the textile industry, thereby helping to promote the research, development, and commercialization of eco-friendly textiles.

A Study on Automatic Analysis System of National Defense Articles (국방 기사 자동 분석 시스템 구축 방안 연구)

  • Kim, Hyunjung;Kim, Wooju
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.21 no.1
    • /
    • pp.86-93
    • /
    • 2018
  • Since media articles, which have a great influence on public opinion, are transmitted to the public through various media, it is very difficult to analyze them manually. There are many discussions on methods that can collect, process, and analyze documents in the academia, but this is mostly done in the areas related to politics and stocks, and national-defense articles are poorly researched. In this study, we will explain how to build an automatic analysis system of national defense articles that can collect information on defense articles automatically, and can process information quickly by using topic modeling with LDA, emotional analysis, and extraction-based text summarization.

Topic-based Knowledge Graph-BERT (토픽 기반의 지식그래프를 이용한 BERT 모델)

  • Min, Chan-Wook;Ahn, Jin-Hyun;Im, Dong-Hyuk
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2022.05a
    • /
    • pp.557-559
    • /
    • 2022
  • 최근 딥러닝의 기술발전으로 자연어 처리 분야에서 Q&A, 문장추천, 개체명 인식 등 다양한 연구가 진행 되고 있다. 딥러닝 기반 자연어 처리에서 좋은 성능을 보이는 트랜스포머 기반 BERT 모델의 성능향상에 대한 다양한 연구도 함께 진행되고 있다. 본 논문에서는 토픽모델인 잠재 디리클레 할당을 이용한 토픽별 지식그래프 분류와 입력문장의 토픽을 추론하는 방법으로 K-BERT 모델을 학습한다. 분류된 토픽 지식그래프와 추론된 토픽을 이용해 K-BERT 모델에서 대용량 지식그래프 사용의 효율적 방법을 제안한다.

Unsupervised learning-based automated patent document classification system (비지도학습 기반 자동 특허문서 분류 시스템)

  • Kim, Sang-Baek;Kim, Ji-Ho;Lee, Hong-Chul
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2021.07a
    • /
    • pp.421-422
    • /
    • 2021
  • 국내·외 기업들의 기술을 보호하고자 매년 100만개의 특허가 출원되고 있다. 등록된 특허 수가 증가될수록 전문가의 판단만으로 원하는 기술 분야의 유효한 특허문서를 선별하는 것은 효율적이지 않으며 객관적인 결과를 기대하기 어려워진다. 본 연구에서는 유효 특허문서 분류 정확성과 전문가의 업무 효율성을 제고하고자 비지도학습 모델인 잠재 디리클레 할당 알고리즘(Latent Dirichlet Allocation, LDA)과 딥러닝을 활용하여 자동 특허문서 분류 시스템을 제안하고자 한다.

  • PDF

Topic Modeling of News Article about International Construction Market Using Latent Dirichlet Allocation (Latent Dirichlet Allocation 기법을 활용한 해외건설시장 뉴스기사의 토픽 모델링(Topic Modeling))

  • Moon, Seonghyeon;Chung, Sehwan;Chi, Seokho
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.38 no.4
    • /
    • pp.595-599
    • /
    • 2018
  • Sufficient understanding of oversea construction market status is crucial to get profitability in the international construction project. Plenty of researchers have been considering the news article as a fine data source for figuring out the market condition, since the data includes market information such as political, economic, and social issue. Since the text data exists in unstructured format with huge size, various text-mining techniques were studied to reduce the unnecessary manpower, time, and cost to summarize the data. However, there are some limitations to extract the needed information from the news article because of the existence of various topics in the data. This research is aimed to overcome the problems and contribute to summarization of market status by performing topic modeling with Latent Dirichlet Allocation. With assuming that 10 topics existed in the corpus, the topics included projects for user convenience (topic-2), private supports to solve poverty problems in Africa (topic-4), and so on. By grouping the topics in the news articles, the results could improve extracting useful information and summarizing the market status.

Analysis of Research Trends in Korean English Education Journals Using Topic Modeling (토픽 모델링을 활용한 한국 영어교육 학술지에 나타난 연구동향 분석)

  • Won, Yongkook;Kim, Youngwoo
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.4
    • /
    • pp.50-59
    • /
    • 2021
  • To understand the research trends of English education in Korea for the last 20 years from 2000 to 2019, 12 major academic journals in Korea in the field of English education were selected, and bibliographic information of 7,329 articles published in these journals were collected and analyzed. The total number of articles increased from the 2000s to the first half of the 2010s, but decreased somewhat in the late 2010s and the number of publications by journal has become similar. These results show that the overall influence of English education journals has decreased and then leveled in terms of quantity. Next, 34 topics were extracted by applying latent Dirichlet allocation (LDA) topic modeling using the English abstract of the articles. Teacher, word, culture/media, and grammar appeared as topics that were highly studied. Topics such as word, vocabulary, and testing and evaluation appeared through unique keywords, and various topics related to learner factors emerged, becoming topics of interest in English education research. Then, topics were analyzed to determine which ones were rising or falling in frequency. As a result of this analysis, qualitative research, vocabulary, learner factor, and testing were found to be rising topics, while falling topics included CALL, language, teaching, and grammar. This change in research topics shows that research interests in the field of English education are shifting from static research topics to data-driven and dynamic research topics.

Text Mining Analysis of News Articles Related to 'Space Hazard' ('우주 위험' 관련 뉴스 기사의 텍스트 마이닝 분석 연구)

  • Jo, Hoon;Sohn, Jungjoo
    • Journal of the Korean earth science society
    • /
    • v.43 no.1
    • /
    • pp.224-235
    • /
    • 2022
  • This study aimed to confirm the status of media reports on space hazards using topic modeling analysis of media articles that are related to space hazards for the past 12 years. Therefore, Latent Dirichlet Allocation (LDA) analysis was performed by collecting over 1200 space hazards articles between 2010 and 2021 on solar storm, artificial space objects, and natural space objects from BIGKins news platform. The articles related to solar storm focused on three topics: the effect of solar explosion on satellites; effect of solar explosion on radio communication in Korea, centered on the Korean Space Weather Center; and relationship between aircrew and space radiation. The articles related to artificial space objects focused on three topics: the threat of space garbage to satellite and space stations and the transition of useful objects into space junk; the relationship between space garbage and humanity as shown in movies; and the effort of developed countries for tracking, monitoring, and disposing of space garbage. The articles related to natural space objects focused on two topics: International Space Agency's tracking and monitoring of near-Earth asteroids and the countermeasures of collisions, and the evolution and extinction of dinosaurs and mammals, with a focus on the collisions of asteroids or comets. Therefore, this study confirmed that domestic media play a role in conveying dangers of space hazards and arousing the attention of public using a total of eight themes in various fields such as society and culture, and derived education method and policy on space hazards.

Latent Keyphrase Extraction Using LDA Model (LDA 모델을 이용한 잠재 키워드 추출)

  • Cho, Taemin;Lee, Jee-Hyong
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.25 no.2
    • /
    • pp.180-185
    • /
    • 2015
  • As the number of document resources is continuously increasing, automatically extracting keyphrases from a document becomes one of the main issues in recent days. However, most previous works have tried to extract keyphrases from words in documents, so they overlooked latent keyphrases which did not appear in documents. Although latent keyphrases do not appear in documents, they can undertake an important role in text summarization and information retrieval because they implicate meaningful concepts or contents of documents. Also, they cover more than one fourth of the entire keyphrases in the real-world datasets and they can be utilized in short articles such as SNS which rarely have explicit keyphrases. In this paper, we propose a new approach that selects candidate keyphrases from the keyphrases of neighbor documents which are similar to the given document and evaluates the importance of the candidates with the individual words in the candidates. Experiment result shows that latent keyphrases can be extracted at a reasonable level.