• Title/Summary/Keyword: latent vector

Search Result 51, Processing Time 0.026 seconds

Ranking Decision Method of Retrieved Documents Using User Profile from Searching Engine (검색 엔진에서 사용자 프로파일을 이용한 문서 순위결정 방법)

  • Kim Yong-Ho;Kim Hyeong-Gyun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.10 no.9
    • /
    • pp.1590-1595
    • /
    • 2006
  • This paper proposes a technique of user oriented document ranking using user refile to provide more satisfied results which reflect preference of specific users. User profile is constructed to represent his or her preference. User pfofile consists of 'term array' and 'preference vector' according to the interest field of one. And the User profile for a particular person is updated by 'user access', 'latent relaeon', 'User Profile' proposed in this paper. The latent structures of documents in same domain are analysed by singular value decomposition(SVD). Then, the rank of documents is determined by comparison of user profile with analyzed document on the basis of relevance.

Bag of Visual Words Method based on PLSA and Chi-Square Model for Object Category

  • Zhao, Yongwei;Peng, Tianqiang;Li, Bicheng;Ke, Shengcai
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.9 no.7
    • /
    • pp.2633-2648
    • /
    • 2015
  • The problem of visual words' synonymy and ambiguity always exist in the conventional bag of visual words (BoVW) model based object category methods. Besides, the noisy visual words, so-called "visual stop-words" will degrade the semantic resolution of visual dictionary. In view of this, a novel bag of visual words method based on PLSA and chi-square model for object category is proposed. Firstly, Probabilistic Latent Semantic Analysis (PLSA) is used to analyze the semantic co-occurrence probability of visual words, infer the latent semantic topics in images, and get the latent topic distributions induced by the words. Secondly, the KL divergence is adopt to measure the semantic distance between visual words, which can get semantically related homoionym. Then, adaptive soft-assignment strategy is combined to realize the soft mapping between SIFT features and some homoionym. Finally, the chi-square model is introduced to eliminate the "visual stop-words" and reconstruct the visual vocabulary histograms. Moreover, SVM (Support Vector Machine) is applied to accomplish object classification. Experimental results indicated that the synonymy and ambiguity problems of visual words can be overcome effectively. The distinguish ability of visual semantic resolution as well as the object classification performance are substantially boosted compared with the traditional methods.

Label Embedding for Improving Classification Accuracy UsingAutoEncoderwithSkip-Connections (다중 레이블 분류의 정확도 향상을 위한 스킵 연결 오토인코더 기반 레이블 임베딩 방법론)

  • Kim, Museong;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.175-197
    • /
    • 2021
  • Recently, with the development of deep learning technology, research on unstructured data analysis is being actively conducted, and it is showing remarkable results in various fields such as classification, summary, and generation. Among various text analysis fields, text classification is the most widely used technology in academia and industry. Text classification includes binary class classification with one label among two classes, multi-class classification with one label among several classes, and multi-label classification with multiple labels among several classes. In particular, multi-label classification requires a different training method from binary class classification and multi-class classification because of the characteristic of having multiple labels. In addition, since the number of labels to be predicted increases as the number of labels and classes increases, there is a limitation in that performance improvement is difficult due to an increase in prediction difficulty. To overcome these limitations, (i) compressing the initially given high-dimensional label space into a low-dimensional latent label space, (ii) after performing training to predict the compressed label, (iii) restoring the predicted label to the high-dimensional original label space, research on label embedding is being actively conducted. Typical label embedding techniques include Principal Label Space Transformation (PLST), Multi-Label Classification via Boolean Matrix Decomposition (MLC-BMaD), and Bayesian Multi-Label Compressed Sensing (BML-CS). However, since these techniques consider only the linear relationship between labels or compress the labels by random transformation, it is difficult to understand the non-linear relationship between labels, so there is a limitation in that it is not possible to create a latent label space sufficiently containing the information of the original label. Recently, there have been increasing attempts to improve performance by applying deep learning technology to label embedding. Label embedding using an autoencoder, a deep learning model that is effective for data compression and restoration, is representative. However, the traditional autoencoder-based label embedding has a limitation in that a large amount of information loss occurs when compressing a high-dimensional label space having a myriad of classes into a low-dimensional latent label space. This can be found in the gradient loss problem that occurs in the backpropagation process of learning. To solve this problem, skip connection was devised, and by adding the input of the layer to the output to prevent gradient loss during backpropagation, efficient learning is possible even when the layer is deep. Skip connection is mainly used for image feature extraction in convolutional neural networks, but studies using skip connection in autoencoder or label embedding process are still lacking. Therefore, in this study, we propose an autoencoder-based label embedding methodology in which skip connections are added to each of the encoder and decoder to form a low-dimensional latent label space that reflects the information of the high-dimensional label space well. In addition, the proposed methodology was applied to actual paper keywords to derive the high-dimensional keyword label space and the low-dimensional latent label space. Using this, we conducted an experiment to predict the compressed keyword vector existing in the latent label space from the paper abstract and to evaluate the multi-label classification by restoring the predicted keyword vector back to the original label space. As a result, the accuracy, precision, recall, and F1 score used as performance indicators showed far superior performance in multi-label classification based on the proposed methodology compared to traditional multi-label classification methods. This can be seen that the low-dimensional latent label space derived through the proposed methodology well reflected the information of the high-dimensional label space, which ultimately led to the improvement of the performance of the multi-label classification itself. In addition, the utility of the proposed methodology was identified by comparing the performance of the proposed methodology according to the domain characteristics and the number of dimensions of the latent label space.

Extended Support Vector Machines for Object Detection and Localization

  • Feyereisl, Jan;Han, Bo-Hyung
    • The Magazine of the IEIE
    • /
    • v.39 no.2
    • /
    • pp.45-54
    • /
    • 2012
  • Object detection is a fundamental task for many high-level computer vision applications such as image retrieval, scene understanding, activity recognition, visual surveillance and many others. Although object detection is one of the most popular problems in computer vision and various algorithms have been proposed thus far, it is also notoriously difficult, mainly due to lack of proper models for object representation, that handle large variations of object structure and appearance. In this article, we review a branch of object detection algorithms based on Support Vector Machines (SVMs), a well-known max-margin technique to minimize classification error. We introduce a few variations of SVMs-Structural SVMs and Latent SVMs-and discuss their applications to object detection and localization.

  • PDF

Online anomaly detection algorithm based on deep support vector data description using incremental centroid update (점진적 중심 갱신을 이용한 deep support vector data description 기반의 온라인 비정상 탐지 알고리즘)

  • Lee, Kibae;Ko, Guhn Hyeok;Lee, Chong Hyun
    • The Journal of the Acoustical Society of Korea
    • /
    • v.41 no.2
    • /
    • pp.199-209
    • /
    • 2022
  • Typical anomaly detection algorithms are trained by using prior data. Thus the batch learning based algorithms cause inevitable performance degradation when characteristics of newly incoming normal data change over time. We propose an online anomaly detection algorithm which can consider the gradual characteristic changes of incoming normal data. The proposed algorithm based on one-class classification model includes both offline and online learning procedures. In offline learning procedure, the algorithm learns the prior data to be close to centroid of the latent space and then updates the centroid of the latent space incrementally by new incoming data. In the online learning, the algorithm continues learning by using the updated centroid. Through experiments using public underwater acoustic data, the proposed online anomaly detection algorithm takes only approximately 2 % additional learning time for the incremental centroid update and learning. Nevertheless, the proposed algorithm shows 19.10 % improvement in Area Under the receiver operating characteristic Curve (AUC) performance compared to the offline learning model when new incoming normal data comes.

Procedure for monitoring autocorrelated processes using LSTM Autoencoder (LSTM Autoencoder를 이용한 자기상관 공정의 모니터링 절차)

  • Pyoungjin Ji;Jaeheon Lee
    • The Korean Journal of Applied Statistics
    • /
    • v.37 no.2
    • /
    • pp.191-207
    • /
    • 2024
  • Many studies have been conducted to quickly detect out-of-control situations in autocorrelated processes. The most traditionally used method is a residual control chart, which uses residuals calculated from a fitted time series model. However, many procedures for monitoring autocorrelated processes using statistical learning methods have recently been proposed. In this paper, we propose a monitoring procedure using the latent vector of LSTM Autoencoder, a deep learning-based unsupervised learning method. We compare the performance of this procedure with the LSTM Autoencoder procedure based on the reconstruction error, the RNN classification procedure, and the residual charting procedure through simulation studies. Simulation results show that the performance of the proposed procedure and the RNN classification procedure are similar, but the proposed procedure has the advantage of being useful in processes where sufficient out-of-control data cannot be obtained, because it does not require out-of-control data for training.

A comparative study of Entity-Grid and LSA models on Korean sentence ordering (한국어 텍스트 문장정렬을 위한 개체격자 접근법과 LSA 기반 접근법의 활용연구)

  • Kim, Youngsam;Kim, Hong-Gee;Shin, Hyopil
    • Korean Journal of Cognitive Science
    • /
    • v.24 no.4
    • /
    • pp.301-321
    • /
    • 2013
  • For the task of sentence ordering, this paper attempts to utilize the Entity-Grid model, a type of entity-based modeling approach, as well as Latent Semantic analysis, which is based on vector space modeling, The task is well known as one of the fundamental tools used to measure text coherence and to enhance text generation processes. For the implementation of the Entity-Grid model, we attempt to use the syntactic roles of the nouns in the Korean text for the ordering task, and measure its impact on the result, since its contribution has been discussed in previous research. Contrary to the case of German, it shows a positive result. In order to obtain the information on the syntactic roles, we use a strategy of using Korean case-markers for the nouns. As a result, it is revealed that the cues can be helpful to measure text coherence. In addition, we compare the results with the ones of the LSA-based model, discussing the advantages and disadvantages of the models, and options for future studies.

  • PDF

Method to Improve Data Sparsity Problem of Collaborative Filtering Using Latent Attribute Preference (잠재적 속성 선호도를 이용한 협업 필터링의 데이터 희소성 문제 개선 방법)

  • Kwon, Hyeong-Joon;Hong, Kwang-Seok
    • Journal of Internet Computing and Services
    • /
    • v.14 no.5
    • /
    • pp.59-67
    • /
    • 2013
  • In this paper, we propose the LAR_CF, latent attribute rating-based collaborative filtering, that is robust to data sparsity problem which is one of traditional problems caused of decreasing rating prediction accuracy. As compared with that existing collaborative filtering method uses a preference rating rated by users as feature vector to calculate similarity between objects, the proposed method improves data sparsity problem using unique attributes of two target objects with existing explicit preference. We consider MovieLens 100k dataset and its item attributes to evaluate the LAR_CF. As a result of artificial data sparsity and full-rating experiments, we confirmed that rating prediction accuracy can be improved rating prediction accuracy in data sparsity condition by the LAR_CF.

An Intelligent Marking System based on Semantic Kernel and Korean WordNet (의미커널과 한글 워드넷에 기반한 지능형 채점 시스템)

  • Cho Woojin;Oh Jungseok;Lee Jaeyoung;Kim Yu-Seop
    • The KIPS Transactions:PartA
    • /
    • v.12A no.6 s.96
    • /
    • pp.539-546
    • /
    • 2005
  • Recently, as the number of Internet users are growing explosively, e-learning has been applied spread, as well as remote evaluation of intellectual capacity However, only the multiple choice and/or the objective tests have been applied to the e-learning, because of difficulty of natural language processing. For the intelligent marking of short-essay typed answer papers with rapidness and fairness, this work utilize heterogenous linguistic knowledges. Firstly, we construct the semantic kernel from un tagged corpus. Then the answer papers of students and instructors are transformed into the vector form. Finally, we evaluate the similarity between the papers by using the semantic kernel and decide whether the answer paper is correct or not, based on the similarity values. For the construction of the semantic kernel, we used latent semantic analysis based on the vector space model. Further we try to reduce the problem of information shortage, by integrating Korean Word Net. For the construction of the semantic kernel we collected 38,727 newspaper articles and extracted 75,175 indexed terms. In the experiment, about 0.894 correlation coefficient value, between the marking results from this system and the human instructors, was acquired.

Segmentation of Bacterial Cells Based on a Hybrid Feature Generation and Deep Learning (하이브리드 피처 생성 및 딥 러닝 기반 박테리아 세포의 세분화)

  • Lim, Seon-Ja;Vununu, Caleb;Kwon, Ki-Ryong;Youn, Sung-Dae
    • Journal of Korea Multimedia Society
    • /
    • v.23 no.8
    • /
    • pp.965-976
    • /
    • 2020
  • We present in this work a segmentation method of E. coli bacterial images generated via phase contrast microscopy using a deep learning based hybrid feature generation. Unlike conventional machine learning methods that use the hand-crafted features, we adopt the denoising autoencoder in order to generate a precise and accurate representation of the pixels. We first construct a hybrid vector that combines original image, difference of Gaussians and image gradients. The created hybrid features are then given to a deep autoencoder that learns the pixels' internal dependencies and the cells' shape and boundary information. The latent representations learned by the autoencoder are used as the inputs of a softmax classification layer and the direct outputs from the classifier represent the coarse segmentation mask. Finally, the classifier's outputs are used as prior information for a graph partitioning based fine segmentation. We demonstrate that the proposed hybrid vector representation manages to preserve the global shape and boundary information of the cells, allowing to retrieve the majority of the cellular patterns without the need of any post-processing.