• Title/Summary/Keyword: LDA기법

Search Result 212, Processing Time 0.032 seconds

An Exploratory Study of Generative AI Service Quality using LDA Topic Modeling and Comparison with Existing Dimensions (LDA토픽 모델링을 활용한 생성형 AI 챗봇의 탐색적 연구 : 기존 AI 챗봇 서비스 품질 요인과의 비교)

  • YaeEun Ahn;Jungsuk Oh
    • Journal of Service Research and Studies
    • /
    • v.13 no.4
    • /
    • pp.191-205
    • /
    • 2023
  • Artificial Intelligence (AI), especially in the domain of text-generative services, has witnessed a significant surge, with forecasts indicating the AI-as-a-Service (AIaaS) market reaching a valuation of $55.0 Billion by 2028. This research set out to explore the quality dimensions characterizing synthetic text media software, with a focus on four key players in the industry: ChatGPT, Writesonic, Jasper, and Anyword. Drawing from a comprehensive dataset of over 4,000 reviews sourced from a software evaluation platform, the study employed the Latent Dirichlet Allocation (LDA) topic modeling technique using the Gensim library. This process resulted the data into 11 distinct topics. Subsequent analysis involved comparing these topics against established AI service quality dimensions, specifically AICSQ and AISAQUAL. Notably, the reviews predominantly emphasized dimensions like availability and efficiency, while others, such as anthropomorphism, which have been underscored in prior literature, were absent. This observation is attributed to the inherent nature of the reviews of AI services examined, which lean more towards semantic understanding rather than direct user interaction. The study acknowledges inherent limitations, mainly potential biases stemming from the singular review source and the specific nature of the reviewer demographic. Possible future research includes gauging the real-world implications of these quality dimensions on user satisfaction and to discuss deeper into how individual dimensions might impact overall ratings.

Multimodal biometrics system using PDA under ubiquitous environments (유비쿼터스 환경에서 PDA를 이용한 다중생체인식 시스템 구현)

  • Kim Yong-Sam;Lee Dae-Jong;Gwon Man-Jun;Chun Myung-Geun
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2006.05a
    • /
    • pp.261-264
    • /
    • 2006
  • 본 논문은 유비쿼터스 컴퓨팅 환경 기반에서의 얼굴과 서명을 이용한 다중생체인식 시스템을 제안한다. 이를 위해서 얼굴과 서명 영상은 PDA로 획득하고, 취득한 영상은 무선랜을 통해 인증 서버로 전송하여 서버로부터 인증된 결과를 받도록하였다. 구현한 다중 생체 인식 시스템의 구성은 두 부분으로 나눌 수 있는데, 먼저 클라이언트 부문인 PDA에서는 임베디드 비주얼 C++로 작성된 사용자 인터페이스 프로그램을 통하여 사용자 등록과 인증 과정을 수행한다. 그리고, 서버 부문에서는 얼굴인식에서 우수한 성능을 보이는 PCA와 LDA 알고리즘을 사용하였고, 서명인식에서는 구간 분할 매칭으로 구간을 분할 한 후 X축과 Y축의 투영값을 Kernel PCA와 LDA 알고리즘에 적용하였다. 얼굴과 서명영상을 이용하여 제안된 알고리즘을 평가한 결과 기존의 단일 생체인식 기법에 비해 우수한 결과를 보임을 확인할 수 있었다.

  • PDF

Comparison of Classification rate of PD Sources (부분방전원 분류기법의 패턴분류율 비교)

  • Park, Seong-Hee;Lim, Kee-Joe;Kang, Seong-Hwa
    • Proceedings of the Korean Institute of Electrical and Electronic Material Engineers Conference
    • /
    • 2005.07a
    • /
    • pp.566-567
    • /
    • 2005
  • Until now variable pattern classification methods have been introduced. So, variable methods in PD source classification were applied. NN(neural network) the most used scheme as a PD(partial discharge) source classification. But in recent year another method were developed. These methods is present superior to NN in the field of image and signal process function of classification. In this paper, it is show classification result in PD source using three methods; that is, BP(back-propagation), ANFIS(adaptive neuro-fuzzy inference system), PCA-LDA(principle component analysis-linear discriminant analysis).

  • PDF

Construction of Event Networks from Large News Data Using Text Mining Techniques (텍스트 마이닝 기법을 적용한 뉴스 데이터에서의 사건 네트워크 구축)

  • Lee, Minchul;Kim, Hea-Jin
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.183-203
    • /
    • 2018
  • News articles are the most suitable medium for examining the events occurring at home and abroad. Especially, as the development of information and communication technology has brought various kinds of online news media, the news about the events occurring in society has increased greatly. So automatically summarizing key events from massive amounts of news data will help users to look at many of the events at a glance. In addition, if we build and provide an event network based on the relevance of events, it will be able to greatly help the reader in understanding the current events. In this study, we propose a method for extracting event networks from large news text data. To this end, we first collected Korean political and social articles from March 2016 to March 2017, and integrated the synonyms by leaving only meaningful words through preprocessing using NPMI and Word2Vec. Latent Dirichlet allocation (LDA) topic modeling was used to calculate the subject distribution by date and to find the peak of the subject distribution and to detect the event. A total of 32 topics were extracted from the topic modeling, and the point of occurrence of the event was deduced by looking at the point at which each subject distribution surged. As a result, a total of 85 events were detected, but the final 16 events were filtered and presented using the Gaussian smoothing technique. We also calculated the relevance score between events detected to construct the event network. Using the cosine coefficient between the co-occurred events, we calculated the relevance between the events and connected the events to construct the event network. Finally, we set up the event network by setting each event to each vertex and the relevance score between events to the vertices connecting the vertices. The event network constructed in our methods helped us to sort out major events in the political and social fields in Korea that occurred in the last one year in chronological order and at the same time identify which events are related to certain events. Our approach differs from existing event detection methods in that LDA topic modeling makes it possible to easily analyze large amounts of data and to identify the relevance of events that were difficult to detect in existing event detection. We applied various text mining techniques and Word2vec technique in the text preprocessing to improve the accuracy of the extraction of proper nouns and synthetic nouns, which have been difficult in analyzing existing Korean texts, can be found. In this study, the detection and network configuration techniques of the event have the following advantages in practical application. First, LDA topic modeling, which is unsupervised learning, can easily analyze subject and topic words and distribution from huge amount of data. Also, by using the date information of the collected news articles, it is possible to express the distribution by topic in a time series. Second, we can find out the connection of events in the form of present and summarized form by calculating relevance score and constructing event network by using simultaneous occurrence of topics that are difficult to grasp in existing event detection. It can be seen from the fact that the inter-event relevance-based event network proposed in this study was actually constructed in order of occurrence time. It is also possible to identify what happened as a starting point for a series of events through the event network. The limitation of this study is that the characteristics of LDA topic modeling have different results according to the initial parameters and the number of subjects, and the subject and event name of the analysis result should be given by the subjective judgment of the researcher. Also, since each topic is assumed to be exclusive and independent, it does not take into account the relevance between themes. Subsequent studies need to calculate the relevance between events that are not covered in this study or those that belong to the same subject.

Examining Suicide Tendency Social Media Texts by Deep Learning and Topic Modeling Techniques (딥러닝 및 토픽모델링 기법을 활용한 소셜 미디어의 자살 경향 문헌 판별 및 분석)

  • Ko, Young Soo;Lee, Ju Hee;Song, Min
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.32 no.3
    • /
    • pp.247-264
    • /
    • 2021
  • This study aims to create a deep learning-based classification model to classify suicide tendency by suicide corpus constructed for the present study. Also, to analyze suicide factors, the study classified suicide tendency corpus into detailed topics by using topic modeling, an analysis technique that automatically extracts topics. For this purpose, 2,011 documents of the suicide-related corpus collected from social media naver knowledge iN were directly annotated into suicide-tendency documents or non-suicide-tendency documents based on suicide prevention education manual issued by the Central Suicide Prevention Center, and we also conducted the deep learning model(LSTM, BERT, ELECTRA) performance evaluation based on the classification model, using annotated corpus data. In addition, one of the topic modeling techniques, LDA identified suicide factors by classifying thematic literature, and co-word analysis and visualization were conducted to analyze the factors in-depth.

희소 부호화 기법과 토픽 모델링을 통한 이미지 분류 모델

  • Jeon, Jin;Kim, Munchurl
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2015.07a
    • /
    • pp.49-50
    • /
    • 2015
  • 본 논문에서는 이미지를 시각적 단어로 표현하여 분석하는 기법인 bag-of-visual words (BoW) 모델을 기반으로 latent dirichlet allocation (LDA) 모델을 결합하여 시각적 단어의 구조를 파악하여 이미지를 분류할 수 있는 모델을 제안한다. 우선 이미지를 시각적 단어로 기존의 방법보다 정확하게 표현하기 위해서 희소 부호화(sparse coding) 기법을 적용한다. 기존의 BoW 모델은 하나의 이미지 패치를 하나의 단어로 표현하였지만, 희소 부호화 기법을 통해 하나의 이미지 패치를 여러 개의 단어로 표현할 수 있다. 제안하는 모델을 이용하여 이미지를 분류하기 위해서 분류 성능 측정에 많이 쓰이는 multi-class SVM 기법을 이용한다. UIUC 스포츠 데이터를 이용한 성능 측정을 통해 제안한 기법의 클래스 분류 성능을 검증하였다.

  • PDF

Analysis of Research Trends Related to drug Repositioning Based on Machine Learning (머신러닝 기반의 신약 재창출 관련 연구 동향 분석)

  • So Yeon Yoo;Gyoo Gun Lim
    • Information Systems Review
    • /
    • v.24 no.1
    • /
    • pp.21-37
    • /
    • 2022
  • Drug repositioning, one of the methods of developing new drugs, is a useful way to discover new indications by allowing drugs that have already been approved for use in people to be used for other purposes. Recently, with the development of machine learning technology, the case of analyzing vast amounts of biological information and using it to develop new drugs is increasing. The use of machine learning technology to drug repositioning will help quickly find effective treatments. Currently, the world is having a difficult time due to a new disease caused by coronavirus (COVID-19), a severe acute respiratory syndrome. Drug repositioning that repurposes drugsthat have already been clinically approved could be an alternative to therapeutics to treat COVID-19 patients. This study intends to examine research trends in the field of drug repositioning using machine learning techniques. In Pub Med, a total of 4,821 papers were collected with the keyword 'Drug Repositioning'using the web scraping technique. After data preprocessing, frequency analysis, LDA-based topic modeling, random forest classification analysis, and prediction performance evaluation were performed on 4,419 papers. Associated words were analyzed based on the Word2vec model, and after reducing the PCA dimension, K-Means clustered to generate labels, and then the structured organization of the literature was visualized using the t-SNE algorithm. Hierarchical clustering was applied to the LDA results and visualized as a heat map. This study identified the research topics related to drug repositioning, and presented a method to derive and visualize meaningful topics from a large amount of literature using a machine learning algorithm. It is expected that it will help to be used as basic data for establishing research or development strategies in the field of drug repositioning in the future.

A Study for Improving the Performance of Data Mining Using Ensemble Techniques (앙상블기법을 이용한 다양한 데이터마이닝 성능향상 연구)

  • Jung, Yon-Hae;Eo, Soo-Heang;Moon, Ho-Seok;Cho, Hyung-Jun
    • Communications for Statistical Applications and Methods
    • /
    • v.17 no.4
    • /
    • pp.561-574
    • /
    • 2010
  • We studied the performance of 8 data mining algorithms including decision trees, logistic regression, LDA, QDA, Neral network, and SVM and their combinations of 2 ensemble techniques, bagging and boosting. In this study, we utilized 13 data sets with binary responses. Sensitivity, Specificity and missclassificate error were used as criteria for comparison.

A Comparative Analysis of Comments Before and After the Controversy Over the 'Back Advertisng' of Influencers : Focused on LDA and Word2vec (인플루언서의 '뒷광고' 논란 전,후에 대한 댓글 비교 분석:LDA와 Word2vec을 중심으로)

  • Cha, Young-Ran
    • The Journal of the Korea Contents Association
    • /
    • v.20 no.10
    • /
    • pp.119-133
    • /
    • 2020
  • Recently, as famous YouTubers produce and broadcast videos that receive sponsorship and advertising such as indirect advertising (PPL), a so-called 'back advertising' controversy continues, and not only famous YouTubers but also entertainers are caught up in the issue. It is causing confusion among the public in Korea. This study attempts to find out the public's reaction before and after the controversy of 'back advertising' by YouTubers through comment analysis. Specifically, among text analysis using R programs, we intend to analyze the issue through various methods such as word cloud, qgraph analysis, LDA, and word2vec analysis, a deep learning technique. The target of the analysis was to analyze the channels of three YouTubers who belonged to the controversy of the 'back advertising' YouTuber and uploaded the 'Apology video'. The 5 most recent videos of Muk-bang YouTuber Moon Bok-hee, who has a similar content disposition to SussTV's Han Hye-yeon stylist, which was controversial, and Yang Pang, a YouTuber who showed various contents (August 09, 2020) Criterion and her first 5 videos uploaded were reviewed. As a result of the study, most of the comments that showed positive reactions before the controversy, but after the controversy, it was found that negative reactions accounted for most of the comments. Therefore, this study examines the degree of change of the public about influencers through comments after the controversy over 'back advertising' through various analysis using R program. This research also devises various measures to prevent the occurrence of back advertising of influencers in the future.

A Reply Graph-based Social Mining Method with Topic Modeling (토픽 모델링을 이용한 댓글 그래프 기반 소셜 마이닝 기법)

  • Lee, Sang Yeon;Lee, Keon Myung
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.24 no.6
    • /
    • pp.640-645
    • /
    • 2014
  • Many people use social network services as to communicate, to share an information and to build social relationships between others on the Internet. Twitter is such a representative service, where millions of tweets are posted a day and a huge amount of data collection has been being accumulated. Social mining that extracts the meaningful information from the massive data has been intensively studied. Typically, Twitter easily can deliver and retweet the contents using the following-follower relationships. Topic modeling in tweet data is a good tool for issue tracking in social media. To overcome the restrictions of short contents in tweets, we introduce a notion of reply graph which is constructed as a graph structure of which nodes correspond to users and of which edges correspond to existence of reply and retweet messages between the users. The LDA topic model, which is a typical method of topic modeling, is ineffective for short textual data. This paper introduces a topic modeling method that uses reply graph to reduce the number of short documents and to improve the quality of mining results. The proposed model uses the LDA model as the topic modeling framework for tweet issue tracking. Some experimental results of the proposed method are presented for a collection of Twitter data of 7 days.