• Title/Summary/Keyword: LDA 모형

Search Result 32, Processing Time 0.027 seconds

Face Recognition using LDA Mixture Model (LDA 혼합 모형을 이용한 얼굴 인식)

  • Kim Hyun-Chul;Kim Daijin;Bang Sung-Yang
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.8
    • /
    • pp.789-794
    • /
    • 2005
  • LDA (Linear Discriminant Analysis) provides the projection that discriminates the data well, and shows a very good performance for face recognition. However, since LDA provides only one transformation matrix over whole data, it is not sufficient to discriminate the complex data consisting of many classes like honan faces. To overcome this weakness, we propose a new face recognition method, called LDA mixture model, that the set of alf classes are partitioned into several clusters and we get a transformation matrix for each cluster. This detailed representation will improve the classification performance greatly. In the simulation of face recognition, LDA mixture model outperforms PCA, LDA, and PCA mixture model in terms of classification performance.

Extensions of LDA by PCA Mixture Model and Class-wise Features (PCA 혼합 모형과 클래스 기반 특징에 의한 LDA의 확장)

  • Kim Hyun-Chul;Kim Daijin;Bang Sung-Yang
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.8
    • /
    • pp.781-788
    • /
    • 2005
  • LDA (Linear Discriminant Analysis) is a data discrimination technique that seeks transformation to maximize the ratio of the between-class scatter and the within-class scatter While it has been successfully applied to several applications, it has two limitations, both concerning the underfitting problem. First, it fails to discriminate data with complex distributions since all data in each class are assumed to be distributed in the Gaussian manner; and second, it can lose class-wise information, since it produces only one transformation over the entire range of classes. We propose three extensions of LDA to overcome the above problems. The first extension overcomes the first problem by modeling the within-class scatter using a PCA mixture model that can represent more complex distribution. The second extension overcomes the second problem by taking different transformation for each class in order to provide class-wise features. The third extension combines these two modifications by representing each class in terms of the PCA mixture model and taking different transformation for each mixture component. It is shown that all our proposed extensions of LDA outperform LDA concerning classification errors for handwritten digit recognition and alphabet recognition.

A Comparative Study on Topic Modeling of LDA, Top2Vec, and BERTopic Models Using LIS Journals in WoS (LDA, Top2Vec, BERTopic 모형의 토픽모델링 비교 연구 - 국외 문헌정보학 분야를 중심으로 -)

  • Yong-Gu Lee;SeonWook Kim
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.58 no.1
    • /
    • pp.5-30
    • /
    • 2024
  • The purpose of this study is to extract topics from experimental data using the topic modeling methods(LDA, Top2Vec, and BERTopic) and compare the characteristics and differences between these models. The experimental data consist of 55,442 papers published in 85 academic journals in the field of library and information science, which are indexed in the Web of Science(WoS). The experimental process was as follows: The first topic modeling results were obtained using the default parameters for each model, and the second topic modeling results were obtained by setting the same optimal number of topics for each model. In the first stage of topic modeling, LDA, Top2Vec, and BERTopic models generated significantly different numbers of topics(100, 350, and 550, respectively). Top2Vec and BERTopic models seemed to divide the topics approximately three to five times more finely than the LDA model. There were substantial differences among the models in terms of the average and standard deviation of documents per topic. The LDA model assigned many documents to a relatively small number of topics, while the BERTopic model showed the opposite trend. In the second stage of topic modeling, generating the same 25 topics for all models, the Top2Vec model tended to assign more documents on average per topic and showed small deviations between topics, resulting in even distribution of the 25 topics. When comparing the creation of similar topics between models, LDA and Top2Vec models generated 18 similar topics(72%) out of 25. This high percentage suggests that the Top2Vec model is more similar to the LDA model. For a more comprehensive comparison analysis, expert evaluation is necessary to determine whether the documents assigned to each topic in the topic modeling results are thematically accurate.

Establishment of ITS Policy Issues Investigation Method in the Road Section applied Textmining (텍스트마이닝을 활용한 도로분야 ITS 정책이슈 탐색기법 정립)

  • Oh, Chang-Seok;Lee, Yong-taeck;Ko, Minsu
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.15 no.6
    • /
    • pp.10-23
    • /
    • 2016
  • With requiring circumspections using big data, this study attempts to develop and apply the search method for audit issues relating to the ITS policy or program. For the foregoing, the auditing process of the board of audit and inspection was converged with the theoretical frame of boundary analysis proposed by William Dunn as an analysis tool for audit issues. Moreover, we apply the text mining technique in order to computerize the analysis tool, which is similar to the boundary analysis in the concept of approaching meta-problems. For the text mining analysis, specific model we applied the antisymmetry-symmetry compound lexeme-based LDA model based on the Latent Dirichlet Allocation(LDA) methodologies proposed by David Blei. The several prime issues were founded through a case analysis as follows: lack of collection of traffic information by the urban traffic information system, which is operated by the National Police Agency, the overlapping problems between the Ministry of Land, Infrastructure and Transport and the Advanced Traffic Management System and fabrication of the mileage on digital tachograph.

Prosodic Break Index Estimation using LDA and Tri-tone Model (LDA와 tri-tone 모델을 이용한 운율경계강도 예측)

  • 강평수;엄기완;김진영
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.7
    • /
    • pp.17-22
    • /
    • 1999
  • In this paper we propose a new mixed method of LDA and tri-tone model to predict Korean prosodic break indices(PBI) for a given utterance. PBI can be used as an important cue of syntactic discontinuity in continuous speech recognition(CSR). The model consists of three steps. At the first step, PBI was predicted with the information of syllable and pause duration through the linear discriminant analysis (LDA) method. At the second step, syllable tone information was used to estimate PBI. In this step we used vector quantization (VQ) for coding the syllable tones and PBI is estimated by tri-tone model. In the last step, two PBI predictors were integrated by a weight factor. The proposed method was tested on 200 literal style spoken sentences. The experimental results showed 72% accuracy.

  • PDF

Estimation of Genetic Parameter for Carcass Traits According to MTDFREML and Gibbs Sampling in Hanwoo(Korean Cattle) (MTDFREML 방법과 Gibbs Sampling 방법에 의한 한우의 육질형질 유전모수 추정)

  • 김내수;이중재;주종철
    • Journal of Animal Science and Technology
    • /
    • v.48 no.3
    • /
    • pp.337-344
    • /
    • 2006
  • The objective of this study was to compare of genetic parameter estimates on carcass traits of Hanwoo(Korean Cattle) according to modeling with Gibbs sampler and MTDFREML. The data set consisted of 1,941 cattle records with 23,058 animals in pedigree files at Hanwoo Improvement Center. The variance and covariance among carcass traits were estimated via Gibbs sampler and MTDFREML algorithms. The carcass traits considered in this study were longissimus dorsi area, backfat thickness, and marbling score. Genetic parameter estimates using Gibbs sampler and MTDFREML from single-trait analysis were similar with those from multiple-trait analysis. The estimated heritabilities using Gibbs sampler were .52~.54, .54 ~.59, and .42~.44 for carcass traits. The estimated heritabilities using MTDFREML were .41, .52~.53, and .31~.32 for carcass traits. The estimated genetic correlation using Gibbs sampler and MTDFREML of LDA between BF and MS were negatively correlated as .34~.36, .23~.37. Otherwise, genetic correlation between BF and MS was positive genetic correlation as .36~.44. The correlations of breeding value for marbling score between via MTDFREML and via Gibbs sampler were 0.989, 0.996 and 0.985 for LDA, BF and MS respectively.

Automatic TV Program Recommendation using LDA based Latent Topic Inference (LDA 기반 은닉 토픽 추론을 이용한 TV 프로그램 자동 추천)

  • Kim, Eun-Hui;Pyo, Shin-Jee;Kim, Mun-Churl
    • Journal of Broadcast Engineering
    • /
    • v.17 no.2
    • /
    • pp.270-283
    • /
    • 2012
  • With the advent of multi-channel TV, IPTV and smart TV services, excessive amounts of TV program contents become available at users' sides, which makes it very difficult for TV viewers to easily find and consume their preferred TV programs. Therefore, the service of automatic TV recommendation is an important issue for TV users for future intelligent TV services, which allows to improve access to their preferred TV contents. In this paper, we present a recommendation model based on statistical machine learning using a collaborative filtering concept by taking in account both public and personal preferences on TV program contents. For this, users' preference on TV programs is modeled as a latent topic variable using LDA (Latent Dirichlet Allocation) which is recently applied in various application domains. To apply LDA for TV recommendation appropriately, TV viewers's interested topics is regarded as latent topics in LDA, and asymmetric Dirichlet distribution is applied on the LDA which can reveal the diversity of the TV viewers' interests on topics based on the analysis of the real TV usage history data. The experimental results show that the proposed LDA based TV recommendation method yields average 66.5% with top 5 ranked TV programs in weekly recommendation, average 77.9% precision in bimonthly recommendation with top 5 ranked TV programs for the TV usage history data of similar taste user groups.

Analysis of English abstracts in Journal of the Korean Data & Information Science Society using topic models and social network analysis (토픽 모형 및 사회연결망 분석을 이용한 한국데이터정보과학회지 영문초록 분석)

  • Kim, Gyuha;Park, Cheolyong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.1
    • /
    • pp.151-159
    • /
    • 2015
  • This article analyzes English abstracts of the articles published in Journal of the Korean Data & Information Science Society using text mining techniques. At first, term-document matrices are formed by various methods and then visualized by social network analysis. LDA (latent Dirichlet allocation) and CTM (correlated topic model) are also employed in order to extract topics from the abstracts. Performances of the topic models are compared via entropy for several numbers of topics and weighting methods to form term-document matrices.

Topic Modeling on Fine Dust Issues Using LDA Analysis (LDA 기법을 이용한 미세먼지 이슈의 토픽모델링 분석)

  • Yoon, soonuk;Kim, Minchul
    • Journal of Energy Engineering
    • /
    • v.29 no.2
    • /
    • pp.23-29
    • /
    • 2020
  • In this study, the last 10 years of news data on fine dust was collected and 80 topics are selected through LDA analysis. As a result, weather-related information made up the main words for the topic, and we can see that fine dust becomes a big issue below 10 degrees Celsius. The frequency of exposure to the media and the maximum concentration of fine dust are correlated with positive. Topics related to fine dust reduction measures and the government's comprehensive measures over the past decade, topics related to products such as air purifiers related to fine dust, topics related to policies protecting vulnerable people from fine dust, and topics on fine dust reduction through R&D were found to be major topics. Measures against fine dust as a social issue can be seen to be closely related to the government's policy.

Analyzing the Factors of Gentrification After Gradual Everyday Recovery

  • Yoon-Ah Song;Jeongeun Song;ZoonKy Lee
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.8
    • /
    • pp.175-186
    • /
    • 2023
  • In this paper, we aim to build a gentrification analysis model and examine its characteristics, focusing on the point at which rents rose sharply alongside the recovery of commercial districts after the gradual resumption of daily life. Recently, in Korea, the influence of social distancing measures after the pandemic has led to the formation of small-scale commercial districts, known as 'hot places', rather than large-scale ones. These hot places have gained popularity by leveraging various media and social networking services to attract customers effectively. As a result, with an increase in the floating population, commercial districts have become active, leading to a rapid surge in rents. However, for small business owners, coping with the sudden rise in rent even with increased sales can lead to gentrification, where they might be forced to leave the area. Therefore, in this study, we seek to analyze the periods before and after by identifying points where rents rise sharply as commercial districts experience revitalization. Firstly, we collect text data to explore topics related to gentrification, utilizing LDA topic modeling. Based on this, we gather data at the commercial district level and build a gentrification analysis model to examine its characteristics. We hope that the analysis of gentrification through this model during a time when commercial districts are being revitalized after facing challenges due to the pandemic can contribute to policies supporting small businesses.