• Title/Summary/Keyword: latent dirichlet allocation (LDA)

Search Result 181, Processing Time 0.023 seconds

A Study on Analysis of Topic Modeling using Customer Reviews based on Sharing Economy: Focusing on Sharing Parking (공유경제 기반의 고객리뷰를 이용한 토픽모델링 분석: 공유주차를 중심으로)

  • Lee, Taewon
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.25 no.3
    • /
    • pp.39-51
    • /
    • 2020
  • This study will examine the social issues and consumer awareness of sharing parking through the method text mining. In this experiment, the topic by keyword was extracted and analyzed using TFIDF (Term frequency inverse document frequency) and LDA (Latent dirichlet allocation) technique. As a result of categorization by topic, citizens' complaints such as local government agreements, parking space negotiations, parking culture improvement, citizen participation, etc., played an important role in implementing shared parking services. The contribution of this study highly differentiated from previous studies that conducted exploratory studies using corporate and regional cases, and can be said to have a high academic contribution. In addition, based on the results obtained by utilizing the LDA analysis in this study, there is a practical contribution that it can be applied or utilized in establishing a sharing economy policy for revitalizing the local economy.

A Study on Science Technology Trend and Prediction Using Topic Modeling (토픽모델링을 활용한 과학기술동향 및 예측에 관한 연구)

  • Park, Ju Seop;Hong, Soon-Goo;Kim, Jong-Weon
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.22 no.4
    • /
    • pp.19-28
    • /
    • 2017
  • Companies and Governments have Mainly used the Delphi Technique to Understand Research or Technology Trends. Because this Technique has the Disadvantage of Consuming a Large Amount of Time and Money, this Study Attempted to Understand and Predict Science and Technology Trends using the Topic Modeling Technique Latent Dirichlet Allocation (LDA). To this end, 20 Specific Artificial Intelligence (AI) Technologies were Extracted From the Abstracts of the US Patent Documents on AI. With Regard to the Extracted Specific Technologies, Core Technologies were Identified, and then these were Divided into Hot and Cold Technologies though a Trend Analysis on their Annual Proportions. Text/Word Searching, Computer Management, Programming Syntax, Network Administration, Multimedia, and Wireless Network Technology were Derived From Hot Technologies. These Technologies are Key Technologies that are Actively Studied in the Field of AI in Recent Years. The Methodology Suggested in this Study may be used to Analyze Trends, Derive Policies, or Predict Technical Demands in Various Fields such as Social Issues, Regional Innovation, and Management.

Performance Improvement of Topic Modeling using BART based Document Summarization (BART 기반 문서 요약을 통한 토픽 모델링 성능 향상)

  • Eun Su Kim;Hyun Yoo;Kyungyong Chung
    • Journal of Internet Computing and Services
    • /
    • v.25 no.3
    • /
    • pp.27-33
    • /
    • 2024
  • The environment of academic research is continuously changing due to the increase of information, which raises the need for an effective way to analyze and organize large amounts of documents. In this paper, we propose Performance Improvement of Topic Modeling using BART(Bidirectional and Auto-Regressive Transformers) based Document Summarization. The proposed method uses BART-based document summary model to extract the core content and improve topic modeling performance using LDA(Latent Dirichlet Allocation) algorithm. We suggest an approach to improve the performance and efficiency of LDA topic modeling through document summarization and validate it through experiments. The experimental results show that the BART-based model for summarizing article data captures the important information of the original articles with F1-Scores of 0.5819, 0.4384, and 0.5038 in Rouge-1, Rouge-2, and Rouge-L performance evaluations, respectively. In addition, topic modeling using summarized documents performs about 8.08% better than topic modeling using full text in the performance comparison using the Perplexity metric. This contributes to the reduction of data throughput and improvement of efficiency in the topic modeling process.

A Study on the Research Topics and Trends in Korean Journal of Remote Sensing: Focusing on Natural & Environmental Disasters (토픽모델링을 이용한 대한원격탐사학회지의 연구주제 분류 및 연구동향 분석: 자연·환경재해 분야를 중심으로)

  • Kim, Taeyong;Park, Hyemin;Heo, Junyong;Yang, Minjune
    • Korean Journal of Remote Sensing
    • /
    • v.37 no.6_2
    • /
    • pp.1869-1880
    • /
    • 2021
  • Korean Journal of Remote Sensing (KJRS), leading the field of remote sensing and GIS in South Korea for over 37 years, has published interdisciplinary research papers. In this study, we performed the topic modeling based on Latent Dirichlet Allocation (LDA), a probabilistic generative model, to identify the research topics and trends using 1) the whole articles, and 2) specific articles related to natural and environmental disasters published in KJRS by analyzing titles, keywords, and abstracts. The results of LDA showed that 4 topics('Polar', 'Hydrosphere', 'Geosphere', and 'Atmosphere') were identified in the whole articles and the topic of 'Polar' was dominant among them (linear slope=3.51 × 10-3, p<0.05) over time. For the specific articles related to natural and environmental disasters, the optimal number of topics were 7 ('Marine pollution', 'Air pollution', 'Volcano', 'Wildfire', 'Flood', 'Drought', and 'Heavy rain') and the topic of 'Air pollution' was dominant (linear slope=2.61 × 10-3, p<0.05) over time. The results from this study provide the history and insight into natural and environmental disasters in KRJS with multidisciplinary researchers.

Extraction of Network Threat Signatures Using Latent Dirichlet Allocation (LDA를 활용한 네트워크 위협 시그니처 추출기법)

  • Lee, Sungil;Lee, Suchul;Lee, Jun-Rak;Youm, Heung-youl
    • Journal of Internet Computing and Services
    • /
    • v.19 no.1
    • /
    • pp.1-10
    • /
    • 2018
  • Network threats such as Internet worms and computer viruses have been significantly increasing. In particular, APTs(Advanced Persistent Threats) and ransomwares become clever and complex. IDSes(Intrusion Detection Systems) have performed a key role as information security solutions during last few decades. To use an IDS effectively, IDS rules must be written properly. An IDS rule includes a key signature and is incorporated into an IDS. If so, the network threat containing the signature can be detected by the IDS while it is passing through the IDS. However, it is challenging to find a key signature for a specific network threat. We first need to analyze a network threat rigorously, and write a proper IDS rule based on the analysis result. If we use a signature that is common to benign and/or normal network traffic, we will observe a lot of false alarms. In this paper, we propose a scheme that analyzes a network threat and extracts key signatures corresponding to the threat. Specifically, our proposed scheme quantifies the degree of correspondence between a network threat and a signature using the LDA(Latent Dirichlet Allocation) algorithm. Obviously, a signature that has significant correspondence to the network threat can be utilized as an IDS rule for detection of the threat.

A Technology Landscape of Artificial Intelligence: Technological Structure and Firms' Competitive Advantages (인공지능 기술 랜드스케이프 : 기술 구조와 기업별 경쟁우위)

  • Lee, Wangjae;Lee, Hakyeon
    • Journal of Korea Technology Innovation Society
    • /
    • v.22 no.3
    • /
    • pp.340-361
    • /
    • 2019
  • This study analyzes the technological structure of artificial intelligence (AI) and technological capabilities of AI companies based on patent information. 2589 AI patents registered in USPTO from 2007 to 2017 were collected and analyzed by the Latent Dirichlet Allocation (LDA) to derive 20 AI technology topics. Analysis of technology development trends by AI technology reveals that visual understanding, data analysis, motion control, and machine learning are growing, while language understanding and speech technology are sluggish. In addition, we also investigated leading companies in each sub-field of AI as well as core competencies of global IT companies. The findings of this study are expected to be fruitfully used for formulation and implementation of technology strategy of AI companies.

Data Analysis of Dropouts of University Students Using Topic Modeling (토픽모델링을 활용한 대학생의 중도탈락 데이터 분석)

  • Jeong, Do-Heon;Park, Ju-Yeon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.1
    • /
    • pp.88-95
    • /
    • 2021
  • This study aims to provide implications for establishing support policies for students by empirically analyzing data on university students dropouts. To this end, data of students enrolled in D University after 2017 were sampled and collected. The collected data was analyzed using topic modeling(LDA: Latent Dirichlet Allocation) technique, which is a probabilistic model based on text mining. As a result of the study, it was found that topics that were characteristic of dropout students were found, and the classification performance between groups through topics was also excellent. Based on these results, a specific educational support system was proposed to prevent dropout of university students. This study is meaningful in that it shows the use of text mining techniques in the education field and suggests an education policy based on data analysis.

Technology Development Strategy of Piggyback Transportation System Using Topic Modeling Based on LDA Algorithm

  • Jun, Sung-Chan;Han, Seong-Ho;Kim, Sang-Baek
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.12
    • /
    • pp.261-270
    • /
    • 2020
  • In this study, we identify promising technologies for Piggyback transportation system by analyzing the relevant patent information. In order for this, we first develop the patent database by extracting relevant technology keywords from the pioneering research papers for the Piggyback flactcar system. We then employed textmining to identify the frequently referred words from the patent database, and using these words, we applied the LDA (Latent Dirichlet Allocation) algorithm in order to identify "topics" that are corresponding to "key" technologies for the Piggyback system. Finally, we employ the ARIMA model to forecast the trends of these "key" technologies for technology forecasting, and identify the promising technologies for the Piggyback system. with keyword search method the patent analysis. The results show that data-driven integrated management system, operation planning system and special cargo (especially fluid and gas) handling/storage technologies are identified to be the "key" promising technolgies for the future of the Piggyback system, and data reception/analysis techniques must be developed in order to improve the system performance. The proposed procedure and analysis method provides useful insights to develop the R&D strategy and the technology roadmap for the Piggyback system.

Application of a Topic Model on the Korea Expressway Corporation's VOC Data (한국도로공사 VOC 데이터를 이용한 토픽 모형 적용 방안)

  • Kim, Ji Won;Park, Sang Min;Park, Sungho;Jeong, Harim;Yun, Ilsoo
    • Journal of Information Technology Services
    • /
    • v.19 no.6
    • /
    • pp.1-13
    • /
    • 2020
  • Recently, 80% of big data consists of unstructured text data. In particular, various types of documents are stored in the form of large-scale unstructured documents through social network services (SNS), blogs, news, etc., and the importance of unstructured data is highlighted. As the possibility of using unstructured data increases, various analysis techniques such as text mining have recently appeared. Therefore, in this study, topic modeling technique was applied to the Korea Highway Corporation's voice of customer (VOC) data that includes customer opinions and complaints. Currently, VOC data is divided into the business areas of Korea Expressway Corporation. However, the classified categories are often not accurate, and the ambiguous ones are classified as "other". Therefore, in order to use VOC data for efficient service improvement and the like, a more systematic and efficient classification method of VOC data is required. To this end, this study proposed two approaches, including method using only the latent dirichlet allocation (LDA), the most representative topic modeling technique, and a new method combining the LDA and the word embedding technique, Word2vec. As a result, it was confirmed that the categories of VOC data are relatively well classified when using the new method. Through these results, it is judged that it will be possible to derive the implications of the Korea Expressway Corporation and utilize it for service improvement.

Reviews Analysis of Korean Clinics Using LDA Topic Modeling (토픽 모델링을 활용한 한의원 리뷰 분석과 마케팅 제언)

  • Kim, Cho-Myong;Jo, A-Ram;Kim, Yang-Kyun
    • The Journal of Korean Medicine
    • /
    • v.43 no.1
    • /
    • pp.73-86
    • /
    • 2022
  • Objectives: In the health care industry, the influence of online reviews is growing. As medical services are provided mainly by providers, those services have been managed by hospitals and clinics. However, direct promotions of medical services by providers are legally forbidden. Due to this reason, consumers, like patients and clients, search a lot of reviews on the Internet to get any information about hospitals, treatments, prices, etc. It can be determined that online reviews indicate the quality of hospitals, and that analysis should be done for sustainable hospital marketing. Method: Using a Python-based crawler, we collected reviews, written by real patients, who had experienced Korean medicine, about more than 14,000 reviews. To extract the most representative words, reviews were divided by positive and negative; after that reviews were pre-processed to get only nouns and adjectives to get TF(Term Frequency), DF(Document Frequency), and TF-IDF(Term Frequency - Inverse Document Frequency). Finally, to get some topics about reviews, aggregations of extracted words were analyzed by using LDA(Latent Dirichlet Allocation) methods. To avoid overlap, the number of topics is set by Davis visualization. Results and Conclusions: 6 and 3 topics extracted in each positive/negative review, analyzed by LDA Topic Model. The main factors, consisting of topics were 1) Response to patients and customers. 2) Customized treatment (consultation) and management. 3) Hospital/Clinic's environments.