• Title/Summary/Keyword: latent dirichlet allocation

Search Result 208, Processing Time 0.028 seconds

Language Model Adaptation Based on Topic Probability of Latent Dirichlet Allocation

  • Jeon, Hyung-Bae;Lee, Soo-Young
    • ETRI Journal
    • /
    • v.38 no.3
    • /
    • pp.487-493
    • /
    • 2016
  • Two new methods are proposed for an unsupervised adaptation of a language model (LM) with a single sentence for automatic transcription tasks. At the training phase, training documents are clustered by a method known as Latent Dirichlet allocation (LDA), and then a domain-specific LM is trained for each cluster. At the test phase, an adapted LM is presented as a linear mixture of the now trained domain-specific LMs. Unlike previous adaptation methods, the proposed methods fully utilize a trained LDA model for the estimation of weight values, which are then to be assigned to the now trained domain-specific LMs; therefore, the clustering and weight-estimation algorithms of the trained LDA model are reliable. For the continuous speech recognition benchmark tests, the proposed methods outperform other unsupervised LM adaptation methods based on latent semantic analysis, non-negative matrix factorization, and LDA with n-gram counting.

Analysis of Construction Accident Incident Using Latent Dirichlet Allocation-based Topic Modeling (잠재 디리클레 할당 기반 토픽 모델링을 통한 건설재해 사례 분석)

  • Kim, Changjae;Kim, Harim;Lee, Changsu;Cho, Hunhee
    • Proceedings of the Korean Institute of Building Construction Conference
    • /
    • 2022.04a
    • /
    • pp.31-32
    • /
    • 2022
  • The construction industry has more safety accidents than other industries. Although there have been more attempts to reduce safety hazards in the industry such as the enforcement of the "Serious Accidents Punishment Act (SAPA)", construction accident has not been reduced enough. In this study, analysis of safety risk factors has been made through Latent Dirichlet Allocation (LDA)-based topic modeling. Risk analysis in construction site would be improved with natural language processing and topic modeling.

  • PDF

Topic Modeling of Korean Newspaper Articles on Aging via Latent Dirichlet Allocation

  • Lee, So Chung
    • Asian Journal for Public Opinion Research
    • /
    • v.10 no.1
    • /
    • pp.4-22
    • /
    • 2022
  • The purpose of this study is to explore the structure of social discourse on aging in Korea by analyzing newspaper articles on aging. The analysis is composed of three steps: first, data collection and preprocessing; second, identifying the latent topics; and third, observing yearly dynamics of topics. In total, 1,472 newspaper articles that included the word "aging" within the title were collected from 10 major newspapers between 2006 and 2019. The underlying topic structure was analyzed using Latent Dirichlet Allocation (LDA), a topic modeling method widely adopted by text mining academics and researchers. Seven latent topics were generated from the LDA model, defined as social issues, death, private insurance, economic growth, national debt, labor market innovation, and income security. The topic loadings demonstrated a clear increase in public interest on topics such as national debt and labor market innovation in recent years. This study concludes that media discourse on aging has shifted towards more productivity and efficiency related issues, requiring older people to be productive citizens. Such subjectivation connotes a decreased role of the government and society by shifting the responsibility to individuals not being able to adapt successfully as productive citizens within the labor market.

Analysis of Research Topics and Trends on COVID-19 in Korea Using Latent Dirichlet Allocation (LDA)

  • Heo, Seong-Min;Yang, Ji-Yeon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.12
    • /
    • pp.83-91
    • /
    • 2020
  • This study aims to identify research topics and examine the trend of Covid19-related papers on DBpia. Applying latent Dirichlet allocation (LDA), we have extracted seven research topics, each of which concerns "International Dynamics", "Technology & Security", "Psychological Impact", "Biomedical-Related", "Economic Impact", "Online Education", and "Religion-Related". In addition, we used the multinomial logistic model to examine the trend of research topics. We found that the papers mainly cover topics related to "International Dynamics" and "Biomedical-Related" before June 2020, but the topics have become diverse since then. In particular, topics regarding "Economic Impact", "Online Education" and "Psychological Impact" has drawn increased attention of researchers. The findings would provide a guideline for collaboration in Covid19-related research, and could serve as a reference work for active research.

Jointly Image Topic and Emotion Detection using Multi-Modal Hierarchical Latent Dirichlet Allocation

  • Ding, Wanying;Zhu, Junhuan;Guo, Lifan;Hu, Xiaohua;Luo, Jiebo;Wang, Haohong
    • Journal of Multimedia Information System
    • /
    • v.1 no.1
    • /
    • pp.55-67
    • /
    • 2014
  • Image topic and emotion analysis is an important component of online image retrieval, which nowadays has become very popular in the widely growing social media community. However, due to the gaps between images and texts, there is very limited work in literature to detect one image's Topics and Emotions in a unified framework, although topics and emotions are two levels of semantics that often work together to comprehensively describe one image. In this work, a unified model, Joint Topic/Emotion Multi-Modal Hierarchical Latent Dirichlet Allocation (JTE-MMHLDA) model, which extends previous LDA, mmLDA, and JST model to capture topic and emotion information at the same time from heterogeneous data, is proposed. Specifically, a two level graphical structured model is built to realize sharing topics and emotions among the whole document collection. The experimental results on a Flickr dataset indicate that the proposed model efficiently discovers images' topics and emotions, and significantly outperform the text-only system by 4.4%, vision-only system by 18.1% in topic detection, and outperforms the text-only system by 7.1%, vision-only system by 39.7% in emotion detection.

  • PDF

A Study on the Analysis of Korean Medical Services using Latent Dirichlet Allocation Topic Modeling : Focusing on online reviews by medical consumers (Latent Dirichlet Allocation 토픽모델링을 이용한 한방 의료 서비스 분석에 관한 연구 : 의료 소비자의 온라인 리뷰를 중심으로)

  • Son, Chaeyeon;Song, Yeonwoo;Lee, Seungho
    • Journal of Society of Preventive Korean Medicine
    • /
    • v.26 no.1
    • /
    • pp.43-57
    • /
    • 2022
  • Objective : This study aims to understand the consumer's needs for Korean medicine medical service using online review analysis of medical consumers. Methods : We analyzed the purpose and satisfaction factors of medical service use using LDA (Latent Dirichlet Allocation) topic modeling. The data used in the study was 120,727 screened reviews written by medical consumers registered on Naver. The analyzed results were compared with the "2020 Korean Medicine Utilization Survey". Results : From 2018 to 2021, the five most frequently used terms were "kindness", "treatment", "doctor", "Korean medicine", and "acupuncture". The main purpose of visiting Korean medicine medical clinic and hospital was to treat "traffic accidents" in 2018, "waist(back) pain" in 2019, "musculoskeletal pain" in 2020 & 2021. Based on the rating, reviewers were satisfied with "explanation of treatment" and "treatment attitude", and dissatisfied with "accessibility to the institution". Conclusion : We concluded that the main purpose of use of Korean medicine institution was to treat musculoskeletal disorders. Based on the results of this study, it is expected that it will be used to improve Korean medicine medical service in the future.

Cancer Research Trends in Traditional Korean Medical Journals since 2000 - Topic Modeling Using Latent Dirichlet Allocation and Keyword Network Analysis (2000년 이후 국내 한의학 암 관련 연구 동향 분석 - Latent Dirichlet Allocation 기반 토픽 모델링 및 연관어 네트워크 분석)

  • Kyeore Bae
    • The Journal of Internal Korean Medicine
    • /
    • v.43 no.6
    • /
    • pp.1075-1088
    • /
    • 2022
  • Objectives: The aim of this study is to analyze cancer research trends in traditional Korean medical journals indexed in the Korea Citation Index since 2000. Methods: Cancer research papers published in traditional Korean medical journals were searched in databases from inception to October 2022. The numbers of publications by journal and by year were descriptively assessed. After natural language processing, topic modeling (based on Latent Dirichlet allocation) and keyword network analysis were conducted. Results: This research trend analysis involved 1,265 papers. Six topics were identified by topic modeling: case reports on symptom management, literature reviews, experiments on apoptosis, herbal extract treatments of breast carcinoma cell lines, anti-proliferative effects of herbal extracts, and anti-tumor effects. Keyword network analysis found that the effects of herbal medicine were assessed in clinical and experimental studies, while acupuncture was mainly mentioned in clinical reports. Conclusions: Cancer research papers in traditional Korean medical journals have contributed to evidence-based medicine. Further experimental studies are needed to elucidate the effects of on different hallmarks of cancer. Rigorous clinical studies are needed to support clinical guidelines.

A Comparative Study between LSI and LDA in Constructing Traceability between Functional and Non-Functional Requirements

  • Byun, Sung-Hoon;Lee, Seok-Won
    • Journal of the Korea Society of Computer and Information
    • /
    • v.24 no.7
    • /
    • pp.19-29
    • /
    • 2019
  • Requirements traceability is regarded as one of the important quality attributes in software requirements engineering field. If requirements traceability is guaranteed then we can trace the requirements' life throughout all the phases, from the customers' needs in the early stage of the project to requirements specification, deployment, and maintenance phase. This includes not only tracking the development artifacts that accompany the requirements, but also tracking backwards from the development artifacts to the initial customer requirements associated with them. In this paper, especially, we dealt with the traceability between functional requirements and non-functional requirements. Among many Information Retrieval (IR) techniques, we decided to utilize Latent Semantic Indexing (LSI) and Latent Dirichlet Allocation (LDA) in our research. Ultimately, we conducted an experiment on constructing traceability by using two techniques and analyzed the experiment results. And then we provided a comparative study between two IR techniques in constructing traceability between functional requirements and non-functional requirements.

Topic Modeling of News Article about International Construction Market Using Latent Dirichlet Allocation (Latent Dirichlet Allocation 기법을 활용한 해외건설시장 뉴스기사의 토픽 모델링(Topic Modeling))

  • Moon, Seonghyeon;Chung, Sehwan;Chi, Seokho
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.38 no.4
    • /
    • pp.595-599
    • /
    • 2018
  • Sufficient understanding of oversea construction market status is crucial to get profitability in the international construction project. Plenty of researchers have been considering the news article as a fine data source for figuring out the market condition, since the data includes market information such as political, economic, and social issue. Since the text data exists in unstructured format with huge size, various text-mining techniques were studied to reduce the unnecessary manpower, time, and cost to summarize the data. However, there are some limitations to extract the needed information from the news article because of the existence of various topics in the data. This research is aimed to overcome the problems and contribute to summarization of market status by performing topic modeling with Latent Dirichlet Allocation. With assuming that 10 topics existed in the corpus, the topics included projects for user convenience (topic-2), private supports to solve poverty problems in Africa (topic-4), and so on. By grouping the topics in the news articles, the results could improve extracting useful information and summarizing the market status.

What Topics Have Been Studied in Korean Mathematics Education for 15 Years: Latent Topic Modeling Analysis

  • Hwang, Jihyun
    • Research in Mathematical Education
    • /
    • v.24 no.4
    • /
    • pp.313-335
    • /
    • 2021
  • The purpose of this research is to identify topics discussed by Korean mathematics education studies and examine research trends for 15 years. I applied latent Dirichlet allocation (LDA) to the original text datasets including English abstracts of 3,157 articles published in eight journals indexed by the Korean Citation Index (KCI) from 1997 to 2019. I identified an LDA model with 60 topics, then research trends in 2,884 articles between 2002 and 2018 were as follows; mathematics educators have paid most attention to teacher education through 2010 to 2015 and curriculum analysis after 2016. The findings in this research can contribute to understand what have been discussed in Korean mathematics education society as well as what will and need to be emphasized more in the future compared to the global research trends. In addition, LDA has potentials to identify topics and keywords of manuscripts newly written and submitted to any journals in addition to information provided by authors.