• 제목/요약/키워드: LDA model

검색결과 161건 처리시간 0.053초

PCA & LDA 융합 알고리즘을 이용한 pRBFNNs 패턴 분류기 설계 (Design of pRBFNNs Pattern Classifiers Model Using a Synthesis of PCA & LDA Algorithm)

  • 김나현;유성훈;오성권
    • 대한전기학회:학술대회논문집
    • /
    • 대한전기학회 2011년도 제42회 하계학술대회
    • /
    • pp.1960-1961
    • /
    • 2011
  • 얼굴 인식에서 가장 많이 사용되고 있는 PCA(Principal Component Analysis)는 고차원의 얼굴 데이터를 낮은 차원으로 표현할 수 있다는 장점이 있다. LDA(Linear Discriminant Analysis)는 서로 다른 데이터를 잘 분리할 수 있으며, 얼굴 인식에서 우수한 성능을 보인다. 본 연구에서는 서로의 장점을 결합하여 PCA와 LDA를 혼합, 적용하였다. 고차원의 얼굴데이터를 PCA로 차원 축소한 후 LDA를 이용해 더욱 효과적인 분류가 되어 얼굴 인식률을 향상시킨다. 인식 모듈로는 pRBFNN(Polynomial Based Radial Basis Function Neural Networks) 모델을 구축하여 고차원 패턴인식 문제에 대한 해결책을 제시하고자 한다. 그리고 제안된 패턴분류기는 얼굴 데이터를 사용하여 성능을 확인한다.

  • PDF

A Study on the Insolvency Prediction Model for Korean Shipping Companies

  • Myoung-Hee Kim
    • 한국항해항만학회지
    • /
    • 제48권2호
    • /
    • pp.109-115
    • /
    • 2024
  • To develop a shipping company insolvency prediction model, we sampled shipping companies that closed between 2005 and 2023. In addition, a closed company and a normal company with similar asset size were selected as a paired sample. For this study, data of a total of 82 companies, including 42 closed companies and 42 general companies, were obtained. These data were randomly divided into a training set (2/3 of data) and a testing set (1/3 of data). Training data were used to develop the model while test data were used to measure the accuracy of the model. In this study, a prediction model for Korean shipping insolvency was developed using financial ratio variables frequently used in previous studies. First, using the LASSO technique, main variables out of 24 independent variables were reduced to 9. Next, we set insolvent companies to 1 and normal companies to 0 and fitted logistic regression, LDA and QDA model. As a result, the accuracy of the prediction model was 82.14% for the QDA model, 78.57% for the logistic regression model, and 75.00% for the LDA model. In addition, variables 'Current ratio', 'Interest expenses to sales', 'Total assets turnover', and 'Operating income to sales' were analyzed as major variables affecting corporate insolvency.

Topic Extraction and Classification Method Based on Comment Sets

  • Tan, Xiaodong
    • Journal of Information Processing Systems
    • /
    • 제16권2호
    • /
    • pp.329-342
    • /
    • 2020
  • In recent years, emotional text classification is one of the essential research contents in the field of natural language processing. It has been widely used in the sentiment analysis of commodities like hotels, and other commentary corpus. This paper proposes an improved W-LDA (weighted latent Dirichlet allocation) topic model to improve the shortcomings of traditional LDA topic models. In the process of the topic of word sampling and its word distribution expectation calculation of the Gibbs of the W-LDA topic model. An average weighted value is adopted to avoid topic-related words from being submerged by high-frequency words, to improve the distinction of the topic. It further integrates the highest classification of the algorithm of support vector machine based on the extracted high-quality document-topic distribution and topic-word vectors. Finally, an efficient integration method is constructed for the analysis and extraction of emotional words, topic distribution calculations, and sentiment classification. Through tests on real teaching evaluation data and test set of public comment set, the results show that the method proposed in the paper has distinct advantages compared with other two typical algorithms in terms of subject differentiation, classification precision, and F1-measure.

A standardization model based on image recognition for performance evaluation of an oral scanner

  • Seo, Sang-Wan;Lee, Wan-Sun;Byun, Jae-Young;Lee, Kyu-Bok
    • The Journal of Advanced Prosthodontics
    • /
    • 제9권6호
    • /
    • pp.409-415
    • /
    • 2017
  • PURPOSE. Accurate information is essential in dentistry. The image information of missing teeth is used in optically based medical equipment in prosthodontic treatment. To evaluate oral scanners, the standardized model was examined from cases of image recognition errors of linear discriminant analysis (LDA), and a model that combines the variables with reference to ISO 12836:2015 was designed. MATERIALS AND METHODS. The basic model was fabricated by applying 4 factors to the tooth profile (chamfer, groove, curve, and square) and the bottom surface. Photo-type and video-type scanners were used to analyze 3D images after image capture. The scans were performed several times according to the prescribed sequence to distinguish the model from the one that did not form, and the results confirmed it to be the best. RESULTS. In the case of the initial basic model, a 3D shape could not be obtained by scanning even if several shots were taken. Subsequently, the recognition rate of the image was improved with every variable factor, and the difference depends on the tooth profile and the pattern of the floor surface. CONCLUSION. Based on the recognition error of the LDA, the recognition rate decreases when the model has a similar pattern. Therefore, to obtain the accurate 3D data, the difference of each class needs to be provided when developing a standardized model.

온라인 리뷰 분석을 통한 상품 평가 기준 추출: LDA 및 k-최근접 이웃 접근법을 활용하여 (Product Evaluation Criteria Extraction through Online Review Analysis: Using LDA and k-Nearest Neighbor Approach)

  • 이지현;정상형;김준호;민은주;여운영;김종우
    • 지능정보연구
    • /
    • 제26권1호
    • /
    • pp.97-117
    • /
    • 2020
  • 상품 평가 기준은 상품에 대한 속성, 가치 등을 표현한 지표로써 사용자나 기업이 상품을 측정하고 파악할 수 있게 한다. 기업이 자사 제품에 대한 객관적인 평가와 비교를 수행하기 위해서는 적절한 기준을 선정하는 것이 필수적이다. 이때, 평가 기준은 소비자들이 제품을 실제로 구매 및 사용 후 평가할 때 고려하는 제품의 특징을 반영하여야 한다. 그러나 기존에 사용되던 평가 기준은 제품마다 상이한 소비자의 의견을 반영하지 못하고 있다. 기존 연구에서는 소비자 의견이 반영된 온라인 리뷰를 통해 상품의 특징, 주제를 추출하고 이를 평가기준으로 사용했다. 하지만 여전히 상품과 연관성이 낮은 평가 기준이 추출되거나 부적절한 단어가 정제되지 않는 한계가 있다. 본 연구에서는 이를 극복하기 위해 잠재 디리클레 할당(Latent Dirichlet Allocation, LDA) 기법으로 리뷰로부터 평가 기준 후보군을 추출하고 이를 k-최근접 이웃 접근법(k-Nearest Neighbor Approach, k-NN)을 이용해 정제하는 모델을 개발하고 검증했다. 제시하는 방법은 준비 단계와 추출 단계로 이루어진다. 준비 단계에서는 워드임베딩(Word Embedding) 모델과 평가 기준 후보군을 정제하기 위한 k-NN 분류기를 생성한다. 추출 단계에서는 k-NN 분류기와 언급 비율을 이용해 평가 기준 후보군을 정제하고 최종 결과를 도출한다. 제안 모델의 성능 평가를 위해 명사 빈도 추출 모델, LDA 빈도 추출 모델, 실제 전자상거래 사이트가 제공하는 평가 기준을 세 비교 모델로 선정했다. 세 모델과의 비교를 위해 설문을 진행하고 점수화하여 결과를 검정했다. 30번의 검정 결과 26번의 결과에서 제안 모델이 우수함을 확인했다. 본 연구의 제안 모델은 전자상거래 사이트에서 리뷰 특성을 반영한 상품군 별 차원을 도출하는데 활용될 수 있고 이를 기초로 인사이트 발굴을 위한 리뷰 분석 및 활용에 크게 기여할 것이다.

Learning Probabilistic Kernel from Latent Dirichlet Allocation

  • Lv, Qi;Pang, Lin;Li, Xiong
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제10권6호
    • /
    • pp.2527-2545
    • /
    • 2016
  • Measuring the similarity of given samples is a key problem of recognition, clustering, retrieval and related applications. A number of works, e.g. kernel method and metric learning, have been contributed to this problem. The challenge of similarity learning is to find a similarity robust to intra-class variance and simultaneously selective to inter-class characteristic. We observed that, the similarity measure can be improved if the data distribution and hidden semantic information are exploited in a more sophisticated way. In this paper, we propose a similarity learning approach for retrieval and recognition. The approach, termed as LDA-FEK, derives free energy kernel (FEK) from Latent Dirichlet Allocation (LDA). First, it trains LDA and constructs kernel using the parameters and variables of the trained model. Then, the unknown kernel parameters are learned by a discriminative learning approach. The main contributions of the proposed method are twofold: (1) the method is computationally efficient and scalable since the parameters in kernel are determined in a staged way; (2) the method exploits data distribution and semantic level hidden information by means of LDA. To evaluate the performance of LDA-FEK, we apply it for image retrieval over two data sets and for text categorization on four popular data sets. The results show the competitive performance of our method.

Machine Learning Based Automatic Categorization Model for Text Lines in Invoice Documents

  • Shin, Hyun-Kyung
    • 한국멀티미디어학회논문지
    • /
    • 제13권12호
    • /
    • pp.1786-1797
    • /
    • 2010
  • Automatic understanding of contents in document image is a very hard problem due to involvement with mathematically challenging problems originated mainly from the over-determined system induced by document segmentation process. In both academic and industrial areas, there have been incessant and various efforts to improve core parts of content retrieval technologies by the means of separating out segmentation related issues using semi-structured document, e.g., invoice,. In this paper we proposed classification models for text lines on invoice document in which text lines were clustered into the five categories in accordance with their contents: purchase order header, invoice header, summary header, surcharge header, purchase items. Our investigation was concentrated on the performance of machine learning based models in aspect of linear-discriminant-analysis (LDA) and non-LDA (logic based). In the group of LDA, na$\"{\i}$ve baysian, k-nearest neighbor, and SVM were used, in the group of non LDA, decision tree, random forest, and boost were used. We described the details of feature vector construction and the selection processes of the model and the parameter including training and validation. We also presented the experimental results of comparison on training/classification error levels for the models employed.

A Development of LDA Topic Association Systems Based on Spark-Hadoop Framework

  • Park, Kiejin;Peng, Limei
    • Journal of Information Processing Systems
    • /
    • 제14권1호
    • /
    • pp.140-149
    • /
    • 2018
  • Social data such as users' comments are unstructured in nature and up-to-date technologies for analyzing such data are constrained by the available storage space and processing time when fast storing and processing is required. On the other hand, it is even difficult in using a huge amount of dynamically generated social data to analyze the user features in a high speed. To solve this problem, we design and implement a topic association analysis system based on the latent Dirichlet allocation (LDA) model. The LDA does not require the training process and thus can analyze the social users' hourly interests on different topics in an easy way. The proposed system is constructed based on the Spark framework that is located on top of Hadoop cluster. It is advantageous of high-speed processing owing to that minimized access to hard disk is required and all the intermediately generated data are processed in the main memory. In the performance evaluation, it requires about 5 hours to analyze the topics for about 1 TB test social data (SNS comments). Moreover, through analyzing the association among topics, we can track the hourly change of social users' interests on different topics.

스포츠 이미지 분류를 위한 희소 부호화 기법을 이용한 공간 피라미드 매칭 LDA 모델 (A Spatial Pyramid Matching LDA Model using Sparse Coding for Classification of Sports Scene Images)

  • 전진;김문철
    • 한국방송∙미디어공학회:학술대회논문집
    • /
    • 한국방송∙미디어공학회 2016년도 하계학술대회
    • /
    • pp.35-36
    • /
    • 2016
  • 본 논문에서는 기존 Bag-of-Visual words (BoW) 접근법에서 반영하지 못한 이미지의 공간 정보를 활용하기 위해서 Spatial Pyramid Matching (SPM) 기법을 Latent Dirichlet Allocation (LDA) 모델에 결합하여 이미지를 분류하는 모델을 제안한다. BoW 접근법은 이미지 패치를 시각적 단어로 변환하여 시각적 단어의 분포로 이미지를 표현하는 기법이며, 기존의 방식이 이미지 패치의 위치정보를 활용하지 못하는 점을 극복하기 위하여 SPM 기법을 도입하는 연구가 진행되어 왔다. 또한 이미지 패치를 정확하게 표현하기 위해서 벡터 양자화 대신 희소 부호화 기법을 이용하여 이미지 패치를 시각적 단어로 변환하였다. 제안하는 모델은 BoW 접근법을 기반으로 위치정보를 활용하는 SPM 을 LDA 모델에 적용하여 시각적 단어의 토픽을 추론함과 동시에 multi-class SVM 분류기를 이용하여 이미지를 분류한다. UIUC 스포츠 데이터를 이용하여 제안하는 모델의 분류 성능을 검증하였다.

  • PDF

다양한 변별분석을 통한 한국어 연결숫자 인식 성능향상에 관한 연구 (Performance Improvement of Korean Connected Digit Recognition Using Various Discriminant Analyses)

  • 송화전;김형순
    • 대한음성학회지:말소리
    • /
    • 제44호
    • /
    • pp.105-113
    • /
    • 2002
  • In Korean, each digit is monosyllable and some pairs are known to have high confusability, causing performance degradation of connected digit recognition systems. To improve the performance, in this paper, we employ various discriminant analyses (DA) including Linear DA (LDA), Weighted Pairwise Scatter LDA WPS-LDA), Heteroscedastic Discriminant Analysis (HDA), and Maximum Likelihood Linear Transformation (MLLT). We also examine several combinations of various DA for additional performance improvement. Experimental results show that applying any DA mentioned above improves the string accuracy, but the amount of improvement of each DA method varies according to the model complexity or number of mixtures per state. Especially, more than 20% of string error reduction is achieved by applying MLLT after WPS-LDA, compared with the baseline system, when class level of DA is defined as a tied state and 1 mixture per state is used.

  • PDF