• Title/Summary/Keyword: Topic vector

Search Result 72, Processing Time 0.066 seconds

A Method on Associated Document Recommendation with Word Correlation Weights (단어 연관성 가중치를 적용한 연관 문서 추천 방법)

  • Kim, Seonmi;Na, InSeop;Shin, Juhyun
    • Journal of Korea Multimedia Society
    • /
    • v.22 no.2
    • /
    • pp.250-259
    • /
    • 2019
  • Big data processing technology and artificial intelligence (AI) are increasingly attracting attention. Natural language processing is an important research area of artificial intelligence. In this paper, we use Korean news articles to extract topic distributions in documents and word distribution vectors in topics through LDA-based Topic Modeling. Then, we use Word2vec to vector words, and generate a weight matrix to derive the relevance SCORE considering the semantic relationship between the words. We propose a way to recommend documents in order of high score.

Combining genetic algorithms and support vector machines for bankruptcy prediction

  • Min, Sung-Hwan;Lee, Ju-Min;Han, In-Goo
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2004.11a
    • /
    • pp.179-188
    • /
    • 2004
  • Bankruptcy prediction is an important and widely studied topic since it can have significant impact on bank lending decisions and profitability. Recently, support vector machine (SVM) has been applied to the problem of bankruptcy prediction. The SVM-based method has been compared with other methods such as neural network, logistic regression and has shown good results. Genetic algorithm (GA) has been increasingly applied in conjunction with other AI techniques such as neural network, CBR. However, few studies have dealt with integration of GA and SVM, though there is a great potential for useful applications in this area. This study proposes the methods for improving SVM performance in two aspects: feature subset selection and parameter optimization. GA is used to optimize both feature subset and parameters of SVM simultaneously for bankruptcy prediction.

  • PDF

A New Hybrid "Park's Vector - Time Synchronous Averaging" Approach to the Induction Motor-fault Monitoring and Diagnosis

  • Ngote, Nabil;Guedira, Said;Cherkaoui, Mohamed;Ouassaid, Mohammed
    • Journal of Electrical Engineering and Technology
    • /
    • v.9 no.2
    • /
    • pp.559-568
    • /
    • 2014
  • Induction motors are critical components in industrial processes since their failure usually lead to an unexpected interruption at the industrial plant. The studies of induction motor behavior during abnormal conditions and the possibility to diagnose different types of faults have been a challenging topic for many electrical machine researchers. In this regard, an efficient and new method to detect the induction motor-fault may be the application of the Time Synchronous Averaging (TSA) to the stator current Park's Vector. The aim of this paper is to present a methodology by which defects in a three-phase wound rotor induction motor can be diagnosed. By exploiting the cyclostationarity characteristics of electrical signals, the TSA method is applied to the stator current Park's Vector, allowing the monitoring of the induction motor operation. Simulation and experimental results are presented in order to show the effectiveness of the proposed method. The obtained results are largely satisfactory, indicating a promising industrial application of the hybrid Park's Vector-TSA approach.

Many-objective Evolutionary Algorithm with Knee point-based Reference Vector Adaptive Adjustment Strategy

  • Zhu, Zhuanghua
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.9
    • /
    • pp.2976-2990
    • /
    • 2022
  • The adaptive adjustment of reference or weight vectors in decomposition-based methods has been a hot research topic in the evolutionary community over the past few years. Although various methods have been proposed regarding this issue, most of them aim to diversify solutions in the objective space to cover the true Pareto fronts as much as possible. Different from them, this paper proposes a knee point-based reference vector adaptive adjustment strategy to concurrently balance the convergence and diversity. To be specific, the knee point-based reference vector adaptive adjustment strategy firstly utilizes knee points to construct the adaptive reference vectors. After that, a new fitness function is defined mathematically. Then, this paper further designs a many-objective evolutionary algorithm with knee point-based reference vector adaptive adjustment strategy, where the mating operation and environmental selection are designed accordingly. The proposed method is extensively tested on the WFG test suite with 8, 10 and 12 objectives and MPDMP with state-of-the-art optimizers. Extensive experimental results demonstrate the superiority of the proposed method over state-of-the-art optimizers and the practicability of the proposed method in tackling practical many-objective optimization problems.

Similar Question Search System for online Q&A for the Korean Language Based on Topic Classification (온라인가나다를 위한 주제 분류 기반 유사 질문 검색 시스템)

  • Mun, Jung-Min;Song, Yeong-Ho;Jin, Ji-Hwan;Lee, Hyun-Seob;Lee, Hyun Ah
    • Korean Journal of Cognitive Science
    • /
    • v.26 no.3
    • /
    • pp.263-278
    • /
    • 2015
  • Online Q&A for the National Institute of the Korean Language provides expert's answers for questions about the Korean language, in which many similar questions are repeatedly posted like other Q&A boards. So, if a system automatically finds questions that are similar to a user's question, it can immediately provide users with recommendable answers to their question and prevent experts from wasting time to answer to similar questions repeatedly. In this paper, we set 5 classes of questions based on its topic which are frequently asked, and propose to classify questions to those classes. Our system searches similar questions by combining topic similarity, vector similarity and sequence similarity. Experiment shows that our method improves search correctness with topic classification. In experiment, Mean Reciprocal Rank(MRR) of our system is 0.756, and precision for the first result is 68.31% and precision for top five results is 87.32%.

Feature selection for text data via topic modeling (토픽 모형을 이용한 텍스트 데이터의 단어 선택)

  • Woosol, Jang;Ye Eun, Kim;Won, Son
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.6
    • /
    • pp.739-754
    • /
    • 2022
  • Usually, text data consists of many variables, and some of them are closely correlated. Such multi-collinearity often results in inefficient or inaccurate statistical analysis. For supervised learning, one can select features by examining the relationship between target variables and explanatory variables. On the other hand, for unsupervised learning, since target variables are absent, one cannot use such a feature selection procedure as in supervised learning. In this study, we propose a word selection procedure that employs topic models to find latent topics. We substitute topics for the target variables and select terms which show high relevance for each topic. Applying the procedure to real data, we found that the proposed word selection procedure can give clear topic interpretation by removing high-frequency words prevalent in various topics. In addition, we observed that, by applying the selected variables to the classifiers such as naïve Bayes classifiers and support vector machines, the proposed feature selection procedure gives results comparable to those obtained by using class label information.

The Learning Preference based Self-Directed Learning System using Topic Map (토픽 맵을 이용한 학습 선호도 기반의 자기주도적 학습 시스템)

  • Jeong, Hwa-Young;Kim, Yun-Ho
    • Journal of Advanced Navigation Technology
    • /
    • v.13 no.2
    • /
    • pp.296-301
    • /
    • 2009
  • In the self-directed learning, learner can construct learning course. But it is very difficult for learner to construct learning course with understanding the various learning contents's characteristics. This research proposed the method to support to learner the information of learning contents type to fit the learner as calculate the learner's learning preference when learner construct the learning course. The calculating method of learning preference used preference vector value of topic map. To apply this method, we tested 20 learning sampling group and presented that this method help to learner to construct learning course as getting the high average degree of learning satisfaction.

  • PDF

Topic Classification for Suicidology

  • Read, Jonathon;Velldal, Erik;Ovrelid, Lilja
    • Journal of Computing Science and Engineering
    • /
    • v.6 no.2
    • /
    • pp.143-150
    • /
    • 2012
  • Computational techniques for topic classification can support qualitative research by automatically applying labels in preparation for qualitative analyses. This paper presents an evaluation of supervised learning techniques applied to one such use case, namely, that of labeling emotions, instructions and information in suicide notes. We train a collection of one-versus-all binary support vector machine classifiers, using cost-sensitive learning to deal with class imbalance. The features investigated range from a simple bag-of-words and n-grams over stems, to information drawn from syntactic dependency analysis and WordNet synonym sets. The experimental results are complemented by an analysis of systematic errors in both the output of our system and the gold-standard annotations.

A GraphML-based Visualization Framework for Workflow-Performers' Closeness Centrality Measurements

  • Kim, Min-Joon;Ahn, Hyun;Park, Minjae
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.9 no.8
    • /
    • pp.3216-3230
    • /
    • 2015
  • A hot-issued research topic in the workflow intelligence arena is the emerging topic of "workflow-supported organizational social networks." These specialized social networks have been proposed to primarily represent the process-driven work-sharing and work-collaborating relationships among the workflow-performers fulfilling a series of workflow-related operations in a workflow-supported organization. We can discover those organizational social networks, and visualize its analysis results as organizational knowledge. In this paper, we are particularly interested in how to visualize the degrees of closeness centralities among workflow-performers by proposing a graphical representation schema based on the Graph Markup Language, which is named to ccWSSN-GraphML. Additionally, we expatiate on the functional expansion of the closeness centralization formulas so as for the visualization framework to handle a group of workflow procedures (or a workflow package) with organizational workflow-performers.

Multi-Vector Document Embedding Using Semantic Decomposition of Complex Documents (복합 문서의 의미적 분해를 통한 다중 벡터 문서 임베딩 방법론)

  • Park, Jongin;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.19-41
    • /
    • 2019
  • According to the rapidly increasing demand for text data analysis, research and investment in text mining are being actively conducted not only in academia but also in various industries. Text mining is generally conducted in two steps. In the first step, the text of the collected document is tokenized and structured to convert the original document into a computer-readable form. In the second step, tasks such as document classification, clustering, and topic modeling are conducted according to the purpose of analysis. Until recently, text mining-related studies have been focused on the application of the second steps, such as document classification, clustering, and topic modeling. However, with the discovery that the text structuring process substantially influences the quality of the analysis results, various embedding methods have actively been studied to improve the quality of analysis results by preserving the meaning of words and documents in the process of representing text data as vectors. Unlike structured data, which can be directly applied to a variety of operations and traditional analysis techniques, Unstructured text should be preceded by a structuring task that transforms the original document into a form that the computer can understand before analysis. It is called "Embedding" that arbitrary objects are mapped to a specific dimension space while maintaining algebraic properties for structuring the text data. Recently, attempts have been made to embed not only words but also sentences, paragraphs, and entire documents in various aspects. Particularly, with the demand for analysis of document embedding increases rapidly, many algorithms have been developed to support it. Among them, doc2Vec which extends word2Vec and embeds each document into one vector is most widely used. However, the traditional document embedding method represented by doc2Vec generates a vector for each document using the whole corpus included in the document. This causes a limit that the document vector is affected by not only core words but also miscellaneous words. Additionally, the traditional document embedding schemes usually map each document into a single corresponding vector. Therefore, it is difficult to represent a complex document with multiple subjects into a single vector accurately using the traditional approach. In this paper, we propose a new multi-vector document embedding method to overcome these limitations of the traditional document embedding methods. This study targets documents that explicitly separate body content and keywords. In the case of a document without keywords, this method can be applied after extract keywords through various analysis methods. However, since this is not the core subject of the proposed method, we introduce the process of applying the proposed method to documents that predefine keywords in the text. The proposed method consists of (1) Parsing, (2) Word Embedding, (3) Keyword Vector Extraction, (4) Keyword Clustering, and (5) Multiple-Vector Generation. The specific process is as follows. all text in a document is tokenized and each token is represented as a vector having N-dimensional real value through word embedding. After that, to overcome the limitations of the traditional document embedding method that is affected by not only the core word but also the miscellaneous words, vectors corresponding to the keywords of each document are extracted and make up sets of keyword vector for each document. Next, clustering is conducted on a set of keywords for each document to identify multiple subjects included in the document. Finally, a Multi-vector is generated from vectors of keywords constituting each cluster. The experiments for 3.147 academic papers revealed that the single vector-based traditional approach cannot properly map complex documents because of interference among subjects in each vector. With the proposed multi-vector based method, we ascertained that complex documents can be vectorized more accurately by eliminating the interference among subjects.