• Title/Summary/Keyword: latent dirichlet allocation (LDA)

Search Result 175, Processing Time 0.031 seconds

Analyzing ages, gender, location on Twitter using LDA (LDA를 이용한 트윗 유저의 연령대, 성별, 지역 분석)

  • Lee, Ho-Kyung;Chun, Ju-Ryong;Song, Nam-Hoon;Ko, Youngjoong
    • Annual Conference on Human and Language Technology
    • /
    • 2013.10a
    • /
    • pp.116-119
    • /
    • 2013
  • 요즘 많은 사람들은 트위터를 통해 짧은 문장의 트윗을 작성하여 자신의 의견이나 생각을 표현한다. 사람들이 작성한 트윗은 사용자의 연령, 성별, 지역에 따라 다른 특성이 담겨있다. 이러한 정보를 이용하여, 기업에서는 연령대, 성별, 지역에 따라 각기 다른 마케팅 전략을 세울 수 있을 것이다. 본 논문에서는 트위터 사용자들의 트윗을 분석하여 연령대, 성별, 지역을 예측하려 한다. 네이버 오픈사전의 자질, 한국전자통신연구원(ETRI)의 개체명 사전을 이용한 자질 및 한국어 형태소 분석, 음절 단위의 bigram을 클래스별 의미 있는 자질로 선택하고 LDA를 이용하여 예측된 확률분포를 활용하여 분류한 결과, 연령 72%, 성별 75%, 지역 43%의 납득할만한 예측 정확도 결과를 얻게 되었다.

  • PDF

Query Expansion based on Knowledge Extraction and Latent Dirichlet Allocation for Clinical Decision Support (의학 문서 검색을 위한 지식 추출 및 LDA 기반 질의 확장)

  • Jo, Seung-Hyeon;Lee, Kyung-Soon
    • Annual Conference on Human and Language Technology
    • /
    • 2015.10a
    • /
    • pp.31-34
    • /
    • 2015
  • 본 논문에서는 임상 의사 결정 지원을 위한 UMLS와 위키피디아를 이용하여 지식 정보를 추출하고 질의 유형 정보를 이용한 LDA 기반 질의 확장 방법을 제안한다. 질의로는 해당 환자가 겪고 있는 증상들이 주어진다. UMLS와 위키피디아를 사용하여 병명과 병과 관련된 증상, 검사 방법, 치료 방법 정보를 추출한다. UMLS와 위키피디아를 사용하여 추출한 의학 정보를 이용하여 질의와 관련된 병명을 추출한다. 질의와 관련된 병명을 이용하여 추가 증상, 검사 방법, 치료 방법 정보를 확장 질의로 선택한다. 또한, LDA를 실행한 후, Word-Topic 클러스터에서 질의와 관련된 클러스터를 추출하고 Document-Topic 클러스터에서 초기 검색 결과와 관련이 높은 클러스터를 추출한다. 추출한 Word-Topic 클러스터와 Document-Topic 클러스터 중 같은 번호를 가지고 있는 클러스터를 찾는다. 그 후, Word-Topic 클러스터에서 의학 용어를 추출하여 확장 질의로 선택한다. 제안 방법의 유효성을 검증하기 위해 TREC Clinical Decision Support(CDS) 2014 테스트 컬렉션에 대해 비교 평가한다.

  • PDF

A Study on Big Data Analysis of Related Patents in Smart Factories Using Topic Models and ChatGPT (토픽 모형과 ChatGPT를 활용한 스마트팩토리 연관 특허 빅데이터 분석에 관한 연구)

  • Sang-Gook Kim;Minyoung Yun;Taehoon Kwon;Jung Sun Lim
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.46 no.4
    • /
    • pp.15-31
    • /
    • 2023
  • In this study, we propose a novel approach to analyze big data related to patents in the field of smart factories, utilizing the Latent Dirichlet Allocation (LDA) topic modeling method and the generative artificial intelligence technology, ChatGPT. Our method includes extracting valuable insights from a large data-set of associated patents using LDA to identify latent topics and their corresponding patent documents. Additionally, we validate the suitability of the topics generated using generative AI technology and review the results with domain experts. We also employ the powerful big data analysis tool, KNIME, to preprocess and visualize the patent data, facilitating a better understanding of the global patent landscape and enabling a comparative analysis with the domestic patent environment. In order to explore quantitative and qualitative comparative advantages at this juncture, we have selected six indicators for conducting a quantitative analysis. Consequently, our approach allows us to explore the distinctive characteristics and investment directions of individual countries in the context of research and development and commercialization, based on a global-scale patent analysis in the field of smart factories. We anticipate that our findings, based on the analysis of global patent data in the field of smart factories, will serve as vital guidance for determining individual countries' directions in research and development investment. Furthermore, we propose a novel utilization of GhatGPT as a tool for validating the suitability of selected topics for policy makers who must choose topics across various scientific and technological domains.

Research Topics in Industrial Engineering 2001~2015 (국내 산업공학 연구 주제 2001~2015)

  • Jeong, Bokwon;Lee, Hakyeon
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.42 no.6
    • /
    • pp.421-431
    • /
    • 2016
  • Over the last four decades, industrial engineering (IE) research in Korea has continued to evolve and expand to respond to social needs. This paper aims to identify research topics in IE research and explore their dynamic changes over time. The topic modeling approach, which automatically discovers topics that pervade a large and unstructured collection of documents, is adopted to identify research topics in domestic IE research. 1,242 articles published from 2001 to 2015 in two IE journals issued by the Korean Institute of Industrial Engineers were collected and their English abstracts were analyzed. Applying the Latent Dirichlet Allocation model led us to uncover 50 topics of domestic IE research. The top 10 most popular topics are revealed, and topic trends are explored by examining the dynamic changes over time. The four topics, technology management, financial engineering, data mining (supervised learning), efficiency analysis, are selected as hot topics while several traditional topics related with manufacturing are revealed as cold topics. The findings are expected to provide fruitful implications for IE researchers.

A Study on Mapping Users' Topic Interest for Question Routing for Community-based Q&A Service (커뮤니티 기반 Q&A서비스에서의 질의 할당을 위한 이용자의 관심 토픽 분석에 관한 연구)

  • Park, Jong Do
    • Journal of the Korean Society for information Management
    • /
    • v.32 no.3
    • /
    • pp.397-412
    • /
    • 2015
  • The main goal of this study is to investigate how to route a question to some relevant users who have interest in the topic of the question based on users' topic interest. In order to assess users' topic interest, archived question-answer pairs in the community were used to identify latent topics in the chosen categories using LDA. Then, these topic models were used to identify users' topic interest. Furthermore, the topics of newly submitted questions were analyzed using the topic models in order to recommend relevant answerers to the question. This study introduces the process of topic modeling to investigate relevant users based on their topic interest.

A Study on the Research Trends in Int'l Trade Using Topic modeling (토픽모델링을 활용한 무역분야 연구동향 분석)

  • Jee-Hoon Lee;Jung-Suk Kim
    • Korea Trade Review
    • /
    • v.45 no.3
    • /
    • pp.55-69
    • /
    • 2020
  • This study examines the research trends and knowledge structure of international trade studies using topic modeling method, which is one of the main methodologies of text mining. We collected and analyzed English abstracts of 1,868 papers of three Korean major journals in the area of international trade from 2003 to 2019. We used the Latent Dirichlet Allocation(LDA), an unsupervised machine learning algorithm to extract the latent topics from the large quantity of research abstracts. 20 topics are identified without any prior human judgement. The topics reveal topographical maps of research in international trade and are representative and meaningful in the sense that most of them correspond to previously established sub-topics in trade studies. Then we conducted a regression analysis on the document-topic distributions generated by LDA to identify hot and cold topics. We discovered 2 hot topics(internationalization capacity and performance of export companies, economic effect of trade) and 2 cold topics(exchange rate and current account, trade finance). Trade studies are characterized as a interdisciplinary study of three agendas(i.e. international economy, International Business, trade practice), and 20 topics identified can be grouped into these 3 agendas. From the estimated results of the study, we find that the Korean government's active pursuit of FTA and consequent necessity of capacity building in Korean export firms lie behind the popularity of topic selection by the Korean researchers in the area of int'l trade.

Detection of Abnormal Behavior by Scene Analysis in Surveillance Video (감시 영상에서의 장면 분석을 통한 이상행위 검출)

  • Bae, Gun-Tae;Uh, Young-Jung;Kwak, Soo-Yeong;Byun, Hye-Ran
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.36 no.12C
    • /
    • pp.744-752
    • /
    • 2011
  • In intelligent surveillance system, various methods for detecting abnormal behavior were proposed recently. However, most researches are not robust enough to be utilized for actual reality which often has occlusions because of assumption the researches have that individual objects can be tracked. This paper presents a novel method to detect abnormal behavior by analysing major motion of the scene for complex environment in which object tracking cannot work. First, we generate Visual Word and Visual Document from motion information extracted from input video and process them through LDA(Latent Dirichlet Allocation) algorithm which is one of document analysis technique to obtain major motion information(location, magnitude, direction, distribution) of the scene. Using acquired information, we compare similarity between motion appeared in input video and analysed major motion in order to detect motions which does not match to major motions as abnormal behavior.

A Study on the Research Trends on Open Innovation using Topic Modeling (토픽 모델링을 이용한 개방형 혁신 연구동향 분석 및 정책 방향 모색)

  • Cho, Sung-Bae;Shin, Shin-Ae;Kang, Dong-Seok
    • Informatization Policy
    • /
    • v.25 no.3
    • /
    • pp.52-74
    • /
    • 2018
  • In February 2018, the Korean government established the "Comprehensive Plans for Government Innovation" in order to realize 'the people-centered government'. The core of the comprehensive plans is participation of the people, which is very similar to open innovation where social issues are solved by ideas and capabilities of the private sector rather than those of the government. Therefore, this study was conducted by extracting open innovation topics through topic modeling based on LDA(Latent Dirichlet Allocation) as English abstract-data from 2003, when the plans for open innovation was first announced, to April 2018. Based on the extracted results, it also conducted a comparative analysis with "Comprehensive Plans for Government Innovation." The study has significant implications in that it derives the relationship between the subjects, analyzes the present policies of Korea on open innovation and suggests directions for development.

Image Analysis and Management Strategy for The National Science Museum Utilizing SNS Big Data Analysis (SNS 빅데이터 분석을 활용한 국립과학관에 대한 이미지 분석과 경영전략 제안)

  • Shin, Seongyeon
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.21 no.1
    • /
    • pp.81-89
    • /
    • 2020
  • The purpose of this study is to investigate science consumers' perceptions of the National Science Museum and suggest effective management strategies for the museum. Research questions were established and the analyses were conducted to achieve the research goals. The collection and analysis of the data were conducted through a new approach to image analysis that combines qualitative and quantitative methods. First, the image of the concept of science was derived from science consumers (adults, undergraduate and graduate students) through a qualitative research method (group-interviewing), and then text analysis was conducted. Second, quantitative research was conducted through LDA (Latent Dirichlet Allocation)-based topical modeling of 63,987 words extracted from 12,920 titles of blog postings from one of the most heavily-trafficked portal sites in Korea. The results of this study indicate that the perception of science differs according to the characteristics of the respondents. Further, topic-modeling extracted 20 topics from the blog posting titles and the topics were condensed into seven factors. Detailed discussions and managerial implications are provided in the conclusion section.

An Analysis of Relationship Between Word Frequency in Social Network Service Data and Crime Occurences (소셜 네트워크 서비스의 단어 빈도와 범죄 발생과의 관계 분석)

  • Kim, Yong-Woo;Kang, Hang-Bong
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.5 no.9
    • /
    • pp.229-236
    • /
    • 2016
  • In the past, crime prediction methods utilized previous records to accurately predict crime occurrences. Yet these crime prediction models had difficulty in updating immense data. To enhance the crime prediction methods, some approaches used social network service (SNS) data in crime prediction studies, but the relationship between SNS data and crime records has not been studied thoroughly. Hence, in this paper, we analyze the relationship between SNS data and criminal occurrences in the perspective of crime prediction. Using Latent Dirichlet Allocation (LDA), we extract tweets that included any words regarding criminal occurrences and analyze the changes in tweet frequency according to the crime records. We then calculate the number of tweets including crime related words and investigate accordingly depending on crime occurrences. Our experimental results demonstrate that there is a difference in crime related tweet occurrences when criminal activity occurs. Moreover, our results show that SNS data analysis will be helpful in crime prediction model as there are certain patterns in tweet occurrences before and after the crime.