• Title/Summary/Keyword: LDA 모델

Search Result 92, Processing Time 0.027 seconds

Text Data Analysis Model Based on Web Application (웹 애플리케이션 기반의 텍스트 데이터 분석 모델)

  • Jin, Go-Whan
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.11
    • /
    • pp.785-792
    • /
    • 2021
  • Since the Fourth Industrial Revolution, various changes have occurred in society as a whole due to advance in technologies such as artificial intelligence and big data. The amount of data that can be collect in the process of applying important technologies tends to increase rapidly. Especially in academia, existing generated literature data is analyzed in order to grasp research trends, and analysis of these literature organizes the research flow and organizes some research methodologies and themes, or by grasping the subjects that are currently being talked about in academia, we are making a lot of contributions to setting the direction of future research. However, it is difficult to access whether data collection is necessary for the analysis of document data without the expertise of ordinary programs. In this paper, propose a text mining-based topic modeling Web application model. Even if you lack specialized knowledge about data analysis methods through the proposed model, you can perform various tasks such as collecting, storing, and text-analyzing research papers, and researchers can analyze previous research and research trends. It is expect that the time and effort required for data analysis can be reduce order to understand.

A Content-based TV Program Recommendation System Using Age and Plots (연령 및 프로그램 줄거리를 활용한 콘텐츠 기반 TV 프로그램 추천 시스템)

  • Bang, Hanbyul;Lee, HyeWoo;Lee, Jee-Hyong
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2015.01a
    • /
    • pp.51-54
    • /
    • 2015
  • 추천 시스템의 대표적인 연구 중 하나인 콘텐츠 기반 추천 시스템 연구는 TV 프로그램이나 영화의 줄거리, 장르, 리뷰 등의 콘텐츠의 메타데이터를 이용한다. 그러나 이러한 연구들은 콘텐츠 관련 정보에만 의존할 뿐, 시청자의 프로파일과 콘텐츠의 정보를 함께 고려하지 않는다. 본 논문에서는 시청자의 프로파일 중 연령과 콘텐츠의 정보인 프로그램의 줄거리를 활용한 TV 프로그램 추천 시스템을 제안한다. 본 추천 시스템은 시청자를 연령에 따라 분류한 후, LDA 알고리즘을 이용하여 시청자의 시청 TV 프로그램의 줄거리를 분류된 나이에 따라 각각의 줄거리 토픽 모델로 생성한다. 이를 기준으로 시청자가 원하는 시간대에 방송되는 프로그램들의 줄거리 토픽벡터와 시청자의 선호도 토픽벡터의 유사도를 비교해 가장 유사도가 높은 TV 프로그램을 시청자에게 추천하는 방식이다. 본 논문에서는 연구의 효용성을 검증하기 위해 줄거리만을 사용한 경우와 줄거리와 연령을 동시에 활용한 경우를 비교 실험하였다. 실험을 통해 프로그램의 줄거리만을 사용한 경우보다 연령을 동시에 활용한 경우의 추천 시스템 성능이 개선된 것을 확인할 수 있었다.

  • PDF

Unsupervised Motion Learning for Abnormal Behavior Detection in Visual Surveillance (영상감시시스템에서 움직임의 비교사학습을 통한 비정상행동탐지)

  • Jeong, Ha-Wook;Chang, Hyung-Jin;Choi, Jin-Young
    • Journal of the Institute of Electronics Engineers of Korea SC
    • /
    • v.48 no.5
    • /
    • pp.45-51
    • /
    • 2011
  • In this paper, we propose an unsupervised learning method for modeling motion trajectory patterns effectively. In our approach, observations of an object on a trajectory are treated as words in a document for latent dirichlet allocation algorithm which is used for clustering words on the topic in natural language process. This allows clustering topics (e.g. go straight, turn left, turn right) effectively in complex scenes, such as crossroads. After this procedure, we learn patterns of word sequences in each cluster using Baum-Welch algorithm used to find the unknown parameters in a hidden markov model. Evaluation of abnormality can be done using forward algorithm by comparing learned sequence and input sequence. Results of experiments show that modeling of semantic region is robust against noise in various scene.

A study on trends and predictions through analysis of linkage analysis based on big data between autonomous driving and spatial information (자율주행과 공간정보의 빅데이터 기반 연계성 분석을 통한 동향 및 예측에 관한 연구)

  • Cho, Kuk;Lee, Jong-Min;Kim, Jong Seo;Min, Guy Sik
    • Journal of Cadastre & Land InformatiX
    • /
    • v.50 no.2
    • /
    • pp.101-115
    • /
    • 2020
  • In this paper, big data analysis method was used to find out global trends in autonomous driving and to derive activate spatial information services. The applied big data was used in conjunction with news articles and patent document in order to analysis trend in news article and patents document data in spatial information. In this paper, big data was created and key words were extracted by using LDA (Latent Dirichlet Allocation) based on the topic model in major news on autonomous driving. In addition, Analysis of spatial information and connectivity, global technology trend analysis, and trend analysis and prediction in the spatial information field were conducted by using WordNet applied based on key words of patent information. This paper was proposed a big data analysis method for predicting a trend and future through the analysis of the connection between the autonomous driving field and spatial information. In future, as a global trend of spatial information in autonomous driving, platform alliances, business partnerships, mergers and acquisitions, joint venture establishment, standardization and technology development were derived through big data analysis.

Analysis of Changes in Restaurant Attributes According to the Spread of Infectious Diseases: Application of Text Mining Techniques (감염병 확산에 따른 레스토랑 선택속성 변화 분석: 텍스트마이닝 기법 적용)

  • Joonil Yoo;Eunji Lee;Chulmo Koo
    • Information Systems Review
    • /
    • v.25 no.4
    • /
    • pp.89-112
    • /
    • 2023
  • In March 2020, as it was declared a COVID-19 pandemic, various quarantine measures were taken. Accordingly, many changes have occurred in the tourism and hospitality industries. In particular, quarantine guidelines, such as the introduction of non-face-to-face services and social distancing, were implemented in the restaurant industry. For decades, research on restaurant attributes has emphasized the importance of three attributes: atmosphere, service quality, and food quality. Nevertheless, to the best of our knowledge, research on restaurant attributes considering the COVID-19 situation is insufficient. To respond to this call, this study attempted an exploratory approach to classify new restaurant attributes based on understanding environmental changes. This study considered 31,115 online reviews registered in Naverplace as an analysis unit, with 475 general restaurants located in Euljiro, Seoul. Further, we attempted to classify restaurant attributes by clustering words within online reviews through TF-IDF and LDA topic modeling techniques. As a result of the analysis, the factors of "prevention of infectious diseases" were derived as new attributes of restaurants in the context of COVID-19 situations, along with the atmosphere, service quality, and food quality. This study is of academic significance by expanding the literature of existing restaurant attributes in that it categorized the three attributes presented by existing restaurant attributes and further presented new attributes. Moreover, the analysis results have led to the formulation of practical recommendations, considering both the operational aspects of restaurants and policy implications.

Topic Modeling Insomnia Social Media Corpus using BERTopic and Building Automatic Deep Learning Classification Model (BERTopic을 활용한 불면증 소셜 데이터 토픽 모델링 및 불면증 경향 문헌 딥러닝 자동분류 모델 구축)

  • Ko, Young Soo;Lee, Soobin;Cha, Minjung;Kim, Seongdeok;Lee, Juhee;Han, Ji Yeong;Song, Min
    • Journal of the Korean Society for information Management
    • /
    • v.39 no.2
    • /
    • pp.111-129
    • /
    • 2022
  • Insomnia is a chronic disease in modern society, with the number of new patients increasing by more than 20% in the last 5 years. Insomnia is a serious disease that requires diagnosis and treatment because the individual and social problems that occur when there is a lack of sleep are serious and the triggers of insomnia are complex. This study collected 5,699 data from 'insomnia', a community on 'Reddit', a social media that freely expresses opinions. Based on the International Classification of Sleep Disorders ICSD-3 standard and the guidelines with the help of experts, the insomnia corpus was constructed by tagging them as insomnia tendency documents and non-insomnia tendency documents. Five deep learning language models (BERT, RoBERTa, ALBERT, ELECTRA, XLNet) were trained using the constructed insomnia corpus as training data. As a result of performance evaluation, RoBERTa showed the highest performance with an accuracy of 81.33%. In order to in-depth analysis of insomnia social data, topic modeling was performed using the newly emerged BERTopic method by supplementing the weaknesses of LDA, which is widely used in the past. As a result of the analysis, 8 subject groups ('Negative emotions', 'Advice and help and gratitude', 'Insomnia-related diseases', 'Sleeping pills', 'Exercise and eating habits', 'Physical characteristics', 'Activity characteristics', 'Environmental characteristics') could be confirmed. Users expressed negative emotions and sought help and advice from the Reddit insomnia community. In addition, they mentioned diseases related to insomnia, shared discourse on the use of sleeping pills, and expressed interest in exercise and eating habits. As insomnia-related characteristics, we found physical characteristics such as breathing, pregnancy, and heart, active characteristics such as zombies, hypnic jerk, and groggy, and environmental characteristics such as sunlight, blankets, temperature, and naps.

A Numerical Study of the Flow Field in the Combustion Chamber of the I.C Engine with Offset Valve (편심 밸브를 갖는 내연기관의 연소실 내부 유동장에 대한 수치적 연구)

  • 양희천;최영기;유홍선;고상근;허선무
    • Transactions of the Korean Society of Mechanical Engineers
    • /
    • v.16 no.8
    • /
    • pp.1552-1565
    • /
    • 1992
  • Three dimensional numerical calculations were carried out for two different combustion chambers with the offset valve in order to investigate the swirl and the squish effects on the flow fields. The modified K-.epsilon. turbulence model considering the change of the density under the condition of the rapid compression and expansion of the pistion was used. During the compression process, it was found that the squish flow which controls the subsequent combustion process was produced due to the piston bowl in the bowl piston type combustion chambers but not for the flat piston type. The swirl velocity close to the solid body rotation was maintained in the flat piston type combustion chambers, but for the bowl piston type a resulting from the change of the solid body rotation was generated in the radial-circumferential plane. For the swirl ratio effect, as the swirl ratio increases, it was found that a large and strong vortex was generated in the radial-circumferential plane of bowl piston type combustion chambers because of the strong inward flows from the combustion chamber wall. These computational results were compared with the results of LDA measurement.

An Analysis of the International Trends of Research on Artificial Intelligence in Education Using Topic Modeling (인공지능 활용 교육의 토픽모델링 분석을 통한 수학교육 연구 방향의 함의)

  • Noh, Jihwa;Ko, Ho Kyoung;Kim, Byeongsoo;Huh, Nan
    • Journal of the Korean School Mathematics Society
    • /
    • v.26 no.1
    • /
    • pp.1-19
    • /
    • 2023
  • This study analyzed the international trends of research concerning artificial intelligence in education by examining 352 papers recently published in the International Journal of Artificial Intelligence in Education(IJAIED) with the topic modeling method. The IJAIED is the official, SCOPUS-indexed journal of the International AIED Society. The analysis revealed that international AIED research trends could be categorized into eight topics with topics such as analyzing student behavior model in learning systems and designing feedback to student solutions being increased over time, whereas research focusing on data handling methods was decreased over time. Based on the findings implications and suggestions for the research and development of the applications of AIED were provided.

A Proposal of a Keyword Extraction System for Detecting Social Issues (사회문제 해결형 기술수요 발굴을 위한 키워드 추출 시스템 제안)

  • Jeong, Dami;Kim, Jaeseok;Kim, Gi-Nam;Heo, Jong-Uk;On, Byung-Won;Kang, Mijung
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.3
    • /
    • pp.1-23
    • /
    • 2013
  • To discover significant social issues such as unemployment, economy crisis, social welfare etc. that are urgent issues to be solved in a modern society, in the existing approach, researchers usually collect opinions from professional experts and scholars through either online or offline surveys. However, such a method does not seem to be effective from time to time. As usual, due to the problem of expense, a large number of survey replies are seldom gathered. In some cases, it is also hard to find out professional persons dealing with specific social issues. Thus, the sample set is often small and may have some bias. Furthermore, regarding a social issue, several experts may make totally different conclusions because each expert has his subjective point of view and different background. In this case, it is considerably hard to figure out what current social issues are and which social issues are really important. To surmount the shortcomings of the current approach, in this paper, we develop a prototype system that semi-automatically detects social issue keywords representing social issues and problems from about 1.3 million news articles issued by about 10 major domestic presses in Korea from June 2009 until July 2012. Our proposed system consists of (1) collecting and extracting texts from the collected news articles, (2) identifying only news articles related to social issues, (3) analyzing the lexical items of Korean sentences, (4) finding a set of topics regarding social keywords over time based on probabilistic topic modeling, (5) matching relevant paragraphs to a given topic, and (6) visualizing social keywords for easy understanding. In particular, we propose a novel matching algorithm relying on generative models. The goal of our proposed matching algorithm is to best match paragraphs to each topic. Technically, using a topic model such as Latent Dirichlet Allocation (LDA), we can obtain a set of topics, each of which has relevant terms and their probability values. In our problem, given a set of text documents (e.g., news articles), LDA shows a set of topic clusters, and then each topic cluster is labeled by human annotators, where each topic label stands for a social keyword. For example, suppose there is a topic (e.g., Topic1 = {(unemployment, 0.4), (layoff, 0.3), (business, 0.3)}) and then a human annotator labels "Unemployment Problem" on Topic1. In this example, it is non-trivial to understand what happened to the unemployment problem in our society. In other words, taking a look at only social keywords, we have no idea of the detailed events occurring in our society. To tackle this matter, we develop the matching algorithm that computes the probability value of a paragraph given a topic, relying on (i) topic terms and (ii) their probability values. For instance, given a set of text documents, we segment each text document to paragraphs. In the meantime, using LDA, we can extract a set of topics from the text documents. Based on our matching process, each paragraph is assigned to a topic, indicating that the paragraph best matches the topic. Finally, each topic has several best matched paragraphs. Furthermore, assuming there are a topic (e.g., Unemployment Problem) and the best matched paragraph (e.g., Up to 300 workers lost their jobs in XXX company at Seoul). In this case, we can grasp the detailed information of the social keyword such as "300 workers", "unemployment", "XXX company", and "Seoul". In addition, our system visualizes social keywords over time. Therefore, through our matching process and keyword visualization, most researchers will be able to detect social issues easily and quickly. Through this prototype system, we have detected various social issues appearing in our society and also showed effectiveness of our proposed methods according to our experimental results. Note that you can also use our proof-of-concept system in http://dslab.snu.ac.kr/demo.html.

Natural Convection in a Water Tank with a Heated Horizontal Plate Facing Downward (아래로 향한 수평가열판이 있는 수조에서의 자연대류)

  • Yang, Sun-Kyu;Chung, Moon-Ki;Helmut Hoffmann
    • Nuclear Engineering and Technology
    • /
    • v.27 no.3
    • /
    • pp.301-316
    • /
    • 1995
  • experimental and computational studies ore carried out to investigate the natural convection of the single phase flow in a tank with a heated horizontal plate facing downward. This is a simplified model for investigations of the influence of a core melt at the bottom of a reactor vessel on the thermal hydraulic behavior in a oater filled cavity surrounding the vessel. In this case the vessel is simulated by a hexahedron insulated box with a heated plate Horizontally mounted at the bottom of the box. The box with the heated plate is installed in a water filled hexahedron tank. Coolers are immersed in the U-type water volume between the box and the tank. Although the multicomponent flows exist more probably below the heated plate in reality, present study concentrates on the single phase flow in a first step prior to investigating the complicated multicomponent thermal hydraulic phenomena. In the present study, in order to get a better understanding for the natural convection characteristics below the heated plate, the velocity and temperature are measured by LDA(Laser Doppler Anemometry) and thermocouples, respectively. And How fields are visualized by taking pictures of the How region with suspended particles. The results show the occurrence of a very effective circulation of the fluid in the whole How area as the heater and coolers are put into operation. In the remote region below the heated plate the new is nearly stagnant, and a remarkable temperature stratification can be observed with very thin thermal boundary. Analytical predictions using the FLUTAN code show a reasonable matching of the measured velocity fields.

  • PDF