• Title/Summary/Keyword: Topic Label

Search Result 23, Processing Time 0.027 seconds

A Method of Calculating Topic Keywords for Topic Labeling (토픽 레이블링을 위한 토픽 키워드 산출 방법)

  • Kim, Eunhoe;Suh, Yuhwa
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.16 no.3
    • /
    • pp.25-36
    • /
    • 2020
  • Topics calculated using LDA topic modeling have to be labeled separately. When labeling a topic, we look at the words that represent the topic, and label the topic. Therefore, it is important to first make a good set of words that represent the topic. This paper proposes a method of calculating a set of words representing a topic using TextRank, which extracts the keywords of a document. The proposed method uses Relevance to select words related to the topic with discrimination. It extracts topic keywords using the TextRank algorithm and connects keywords with a high frequency of simultaneous occurrence to express the topic with a higher coverage.

Keyword Reorganization Techniques for Improving the Identifiability of Topics (토픽 식별성 향상을 위한 키워드 재구성 기법)

  • Yun, Yeoil;Kim, Namgyu
    • Journal of Information Technology Services
    • /
    • v.18 no.4
    • /
    • pp.135-149
    • /
    • 2019
  • Recently, there are many researches for extracting meaningful information from large amount of text data. Among various applications to extract information from text, topic modeling which express latent topics as a group of keywords is mainly used. Topic modeling presents several topic keywords by term/topic weight and the quality of those keywords are usually evaluated through coherence which implies the similarity of those keywords. However, the topic quality evaluation method based only on the similarity of keywords has its limitations because it is difficult to describe the content of a topic accurately enough with just a set of similar words. In this research, therefore, we propose topic keywords reorganizing method to improve the identifiability of topics. To reorganize topic keywords, each document first needs to be labeled with one representative topic which can be extracted from traditional topic modeling. After that, classification rules for classifying each document into a corresponding label are generated, and new topic keywords are extracted based on the classification rules. To evaluated the performance our method, we performed an experiment on 1,000 news articles. From the experiment, we confirmed that the keywords extracted from our proposed method have better identifiability than traditional topic keywords.

Hot Topic Discovery across Social Networks Based on Improved LDA Model

  • Liu, Chang;Hu, RuiLin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.11
    • /
    • pp.3935-3949
    • /
    • 2021
  • With the rapid development of Internet and big data technology, various online social network platforms have been established, producing massive information every day. Hot topic discovery aims to dig out meaningful content that users commonly concern about from the massive information on the Internet. Most of the existing hot topic discovery methods focus on a single network data source, and can hardly grasp hot spots as a whole, nor meet the challenges of text sparsity and topic hotness evaluation in cross-network scenarios. This paper proposes a novel hot topic discovery method across social network based on an im-proved LDA model, which first integrates the text information from multiple social network platforms into a unified data set, then obtains the potential topic distribution in the text through the improved LDA model. Finally, it adopts a heat evaluation method based on the word frequency of topic label words to take the latent topic with the highest heat value as a hot topic. This paper obtains data from the online social networks and constructs a cross-network topic discovery data set. The experimental results demonstrate the superiority of the proposed method compared to baseline methods.

Collaborative Similarity Metric Learning for Semantic Image Annotation and Retrieval

  • Wang, Bin;Liu, Yuncai
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.7 no.5
    • /
    • pp.1252-1271
    • /
    • 2013
  • Automatic image annotation has become an increasingly important research topic owing to its key role in image retrieval. Simultaneously, it is highly challenging when facing to large-scale dataset with large variance. Practical approaches generally rely on similarity measures defined over images and multi-label prediction methods. More specifically, those approaches usually 1) leverage similarity measures predefined or learned by optimizing for ranking or annotation, which might be not adaptive enough to datasets; and 2) predict labels separately without taking the correlation of labels into account. In this paper, we propose a method for image annotation through collaborative similarity metric learning from dataset and modeling the label correlation of the dataset. The similarity metric is learned by simultaneously optimizing the 1) image ranking using structural SVM (SSVM), and 2) image annotation using correlated label propagation, with respect to the similarity metric. The learned similarity metric, fully exploiting the available information of datasets, would improve the two collaborative components, ranking and annotation, and sequentially the retrieval system itself. We evaluated the proposed method on Corel5k, Corel30k and EspGame databases. The results for annotation and retrieval show the competitive performance of the proposed method.

Research on Community Knowledge Modeling of Readers Based on Interest Labels

  • Kai, Wang;Wei, Pan;Xingzhi, Chen
    • Journal of Information Processing Systems
    • /
    • v.19 no.1
    • /
    • pp.55-66
    • /
    • 2023
  • Community portraits can deeply explore the characteristics of community structures and describe the personalized knowledge needs of community users, which is of great practical significance for improving community recommendation services, as well as the accuracy of resource push. The current community portraits generally have the problems of weak perception of interest characteristics and low degree of integration of topic information. To resolve this problem, the reader community portrait method based on the thematic and timeliness characteristics of interest labels (UIT) is proposed. First, community opinion leaders are identified based on multi-feature calculations, and then the topic features of their texts are identified based on the LDA topic model. On this basis, a semantic mapping including "reader community-opinion leader-text content" was established. Second, the readers' interest similarity of the labels was dynamically updated, and two kinds of tag parameters were integrated, namely, the intensity of interest labels and the stability of interest labels. Finally, the similarity distance between the opinion leader and the topic of interest was calculated to obtain the dynamic interest set of the opinion leaders. Experimental analysis was conducted on real data from the Douban reading community. The experimental results show that the UIT has the highest average F value (0.551) compared to the state-of-the-art approaches, which indicates that the UIT has better performance in the smooth time dimension.

Analyzing the discriminative characteristic of cover letters using text mining focused on Air Force applicants (텍스트 마이닝을 이용한 공군 부사관 지원자 자기소개서의 차별적 특성 분석)

  • Kwon, Hyeok;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.75-94
    • /
    • 2021
  • The low birth rate and shortened military service period are causing concerns about selecting excellent military officers. The Republic of Korea entered a low birth rate society in 1984 and an aged society in 2018 respectively, and is expected to be in a super-aged society in 2025. In addition, the troop-oriented military is changed as a state-of-the-art weapons-oriented military, and the reduction of the military service period was implemented in 2018 to ease the burden of military service for young people and play a role in the society early. Some observe that the application rate for military officers is falling due to a decrease of manpower resources and a preference for shortened mandatory military service over military officers. This requires further consideration of the policy of securing excellent military officers. Most of the related studies have used social scientists' methodologies, but this study applies the methodology of text mining suitable for large-scale documents analysis. This study extracts words of discriminative characteristics from the Republic of Korea Air Force Non-Commissioned Officer Applicant cover letters and analyzes the polarity of pass and fail. It consists of three steps in total. First, the application is divided into general and technical fields, and the words characterized in the cover letter are ordered according to the difference in the frequency ratio of each field. The greater the difference in the proportion of each application field, the field character is defined as 'more discriminative'. Based on this, we extract the top 50 words representing discriminative characteristics in general fields and the top 50 words representing discriminative characteristics in technology fields. Second, the number of appropriate topics in the overall cover letter is calculated through the LDA. It uses perplexity score and coherence score. Based on the appropriate number of topics, we then use LDA to generate topic and probability, and estimate which topic words of discriminative characteristic belong to. Subsequently, the keyword indicators of questions used to set the labeling candidate index, and the most appropriate index indicator is set as the label for the topic when considering the topic-specific word distribution. Third, using L-LDA, which sets the cover letter and label as pass and fail, we generate topics and probabilities for each field of pass and fail labels. Furthermore, we extract only words of discriminative characteristics that give labeled topics among generated topics and probabilities by pass and fail labels. Next, we extract the difference between the probability on the pass label and the probability on the fail label by word of the labeled discriminative characteristic. A positive figure can be seen as having the polarity of pass, and a negative figure can be seen as having the polarity of fail. This study is the first research to reflect the characteristics of cover letters of Republic of Korea Air Force non-commissioned officer applicants, not in the private sector. Moreover, these methodologies can apply text mining techniques for multiple documents, rather survey or interview methods, to reduce analysis time and increase reliability for the entire population. For this reason, the methodology proposed in the study is also applicable to other forms of multiple documents in the field of military personnel. This study shows that L-LDA is more suitable than LDA to extract discriminative characteristics of Republic of Korea Air Force Noncommissioned cover letters. Furthermore, this study proposes a methodology that uses a combination of LDA and L-LDA. Therefore, through the analysis of the results of the acquisition of non-commissioned Republic of Korea Air Force officers, we would like to provide information available for acquisition and promotional policies and propose a methodology available for research in the field of military manpower acquisition.

Feature selection for text data via topic modeling (토픽 모형을 이용한 텍스트 데이터의 단어 선택)

  • Woosol, Jang;Ye Eun, Kim;Won, Son
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.6
    • /
    • pp.739-754
    • /
    • 2022
  • Usually, text data consists of many variables, and some of them are closely correlated. Such multi-collinearity often results in inefficient or inaccurate statistical analysis. For supervised learning, one can select features by examining the relationship between target variables and explanatory variables. On the other hand, for unsupervised learning, since target variables are absent, one cannot use such a feature selection procedure as in supervised learning. In this study, we propose a word selection procedure that employs topic models to find latent topics. We substitute topics for the target variables and select terms which show high relevance for each topic. Applying the procedure to real data, we found that the proposed word selection procedure can give clear topic interpretation by removing high-frequency words prevalent in various topics. In addition, we observed that, by applying the selected variables to the classifiers such as naïve Bayes classifiers and support vector machines, the proposed feature selection procedure gives results comparable to those obtained by using class label information.

An Extended Generative Feature Learning Algorithm for Image Recognition

  • Wang, Bin;Li, Chuanjiang;Zhang, Qian;Huang, Jifeng
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.8
    • /
    • pp.3984-4005
    • /
    • 2017
  • Image recognition has become an increasingly important topic for its wide application. It is highly challenging when facing to large-scale database with large variance. The recognition systems rely on a key component, i.e. the low-level feature or the learned mid-level feature. The recognition performance can be potentially improved if the data distribution information is exploited using a more sophisticated way, which usually a function over hidden variable, model parameter and observed data. These methods are called generative score space. In this paper, we propose a discriminative extension for the existing generative score space methods, which exploits class label when deriving score functions for image recognition task. Specifically, we first extend the regular generative models to class conditional models over both observed variable and class label. Then, we derive the mid-level feature mapping from the extended models. At last, the derived feature mapping is embedded into a discriminative classifier for image recognition. The advantages of our proposed approach are two folds. First, the resulted methods take simple and intuitive forms which are weighted versions of existing methods, benefitting from the Bayesian inference of class label. Second, the probabilistic generative modeling allows us to exploit hidden information and is well adapt to data distribution. To validate the effectiveness of the proposed method, we cooperate our discriminative extension with three generative models for image recognition task. The experimental results validate the effectiveness of our proposed approach.

Exploring Service Improvement Opportunities through Analysis of OTT App Reviews (OTT 앱 리뷰 분석을 통한 서비스 개선 기회 발굴 방안 연구)

  • Joongmin Lee;Chie Hoon Song
    • Journal of the Korean Society of Industry Convergence
    • /
    • v.27 no.2_2
    • /
    • pp.445-456
    • /
    • 2024
  • This study aims to suggest service improvement opportunities by analyzing user review data of the top three OTT service apps(Netflix, Coupang Play, and TVING) on Google Play Store. To achieve this objective, we proposed a framework for uncovering service opportunities through the analysis of negative user reviews from OTT service providers. The framework involves automating the labeling of identified topics and generating service improvement opportunities using topic modeling and prompt engineering, leveraging GPT-4, a generative AI model. Consequently, we pinpointed five dissatisfaction topics for Netflix and TVING, and nine for Coupang Play. Common issues include "video playback errors", "app installation and update errors", "subscription and payment" problems, and concerns regarding "content quality". The commonly identified service enhancement opportunities include "enhancing and diversifying content quality". "optimizing video quality and data usage", "ensuring compatibility with external devices", and "streamlining payment and cancellation processes". In contrast to prior research, this study introduces a novel research framework leveraging generative AI to label topics and propose improvement strategies based on the derived topics. This is noteworthy as it identifies actionable service opportunities aimed at enhancing service competitiveness and satisfaction, instead of merely outlining topics.

Warning Classification Method Based On Artificial Neural Network Using Topics of Source Code (소스코드 주제를 이용한 인공신경망 기반 경고 분류 방법)

  • Lee, Jung-Been
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.9 no.11
    • /
    • pp.273-280
    • /
    • 2020
  • Automatic Static Analysis Tools help developers to quickly find potential defects in source code with less effort. However, the tools reports a large number of false positive warnings which do not have to fix. In our study, we proposed an artificial neural network-based warning classification method using topic models of source code blocks. We collect revisions for fixing bugs from software change management (SCM) system and extract code blocks modified by developers. In deep learning stage, topic distribution values of the code blocks and the binary data that present the warning removal in the blocks are used as input and target data in an simple artificial neural network, respectively. In our experimental results, our warning classification model based on neural network shows very high performance to predict label of warnings such as true or false positive.