• 제목/요약/키워드: Topic Information

검색결과 1,898건 처리시간 0.027초

Hot Topic Discovery across Social Networks Based on Improved LDA Model

  • Liu, Chang;Hu, RuiLin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제15권11호
    • /
    • pp.3935-3949
    • /
    • 2021
  • With the rapid development of Internet and big data technology, various online social network platforms have been established, producing massive information every day. Hot topic discovery aims to dig out meaningful content that users commonly concern about from the massive information on the Internet. Most of the existing hot topic discovery methods focus on a single network data source, and can hardly grasp hot spots as a whole, nor meet the challenges of text sparsity and topic hotness evaluation in cross-network scenarios. This paper proposes a novel hot topic discovery method across social network based on an im-proved LDA model, which first integrates the text information from multiple social network platforms into a unified data set, then obtains the potential topic distribution in the text through the improved LDA model. Finally, it adopts a heat evaluation method based on the word frequency of topic label words to take the latent topic with the highest heat value as a hot topic. This paper obtains data from the online social networks and constructs a cross-network topic discovery data set. The experimental results demonstrate the superiority of the proposed method compared to baseline methods.

영어 정보구조에서의 화제에 대한 억양 연구 (A Study on Intonation of the Topic in English Information Structure)

  • 이용재;김화영
    • 음성과학
    • /
    • 제13권2호
    • /
    • pp.87-105
    • /
    • 2006
  • Many researchers have studied the relationship between the information structure and intonation. Arguments about the relations between the information structure and intonation researched so far can be summarized as follows: the intonation of topic and focus in English information structure is represented as i) a pitch accent, ii) a tune (a pitch accent + an edge tone), or iii) a boundary tone. The purpose of this paper is to study various informational patterns of the topic in English information structure, using real TV discussion data. In this paper, the topic is classified as contrastive topics and non-contrastive topics, based on contrastiveness. The results show that the intonation of the topic in English information structure is implemented as a pitch accent, neither a tune nor a boundary tone. Of the non-contrastive topics, while anaphoric determinative NP topics (Lnc, Lncd) are mainly represented as a H* pitch accent, the pronoun topic(Lp) does not have a pitch accent. Of contrastive topics, while the semantically focused topic(Lci) is mainly represented as a H* pitch accent, the contrastively focused topic(Lcc) is represented as both H* and L+H* pitch accents. It shows that it is not always true that the topic or focus to have the meaning of contrast is represented as a L+H* pitch accent as argued in the previous researches.

  • PDF

영어 화제와 초점의 억양 실현 양상 (Tonal Implementation of English Topic and Focus)

  • 강선미;옥유롬;김기호
    • 음성과학
    • /
    • 제10권4호
    • /
    • pp.41-55
    • /
    • 2003
  • This paper investigates the tonal patterns of English information structure composed of topic and focus. It has been argued in previous theories that there is a significant relationship between English topic-focus structure and intonation. The English topic is marked with L+H* pitch accent and focus is marked with H* pitch accent. These theories, however, are oversimplified ones since they do not consider the contextual differences of topic and focus. To examine more concrete tonal patterns of English topic and focus, we classified topic into two subcategories of reminding topic and old-information topic. Focus was categorized into three: information focus, contrastive focus, and reference focus. The overall results show that native English speakers are inclined to use both the L+H* and H* pitch accent for the topic and focus of an utterance. We also observe a tendency to deaccentuate the topics given as old information and to mark the topics given as noun phrase with H* pitch accent. As for the intonation of focus, H* pitch accent is the most frequent type of accent, but L+H* also shows a high percentage of implementation especially in the context of correction or contrast.

  • PDF

Too Much Information - Trying to Help or Deceive? An Analysis of Yelp Reviews

  • Hyuk Shin;Hong Joo Lee;Ruth Angelie Cruz
    • Asia pacific journal of information systems
    • /
    • 제33권2호
    • /
    • pp.261-281
    • /
    • 2023
  • The proliferation of online customer reviews has completely changed how consumers purchase. Consumers now heavily depend on authentic experiences shared by previous customers. However, deceptive reviews that aim to manipulate customer decision-making to promote or defame a product or service pose a risk to businesses and buyers. The studies investigating consumer perception of deceptive reviews found that one of the important cues is based on review content. This study aims to investigate the impact of the information amount of review on the review truthfulness. This study adopted the Information Manipulation Theory (IMT) as an overarching theory, which asserts that the violations of one or more of the Gricean maxim are deceptive behaviors. It is regarded as a quantity violation if the required information amount is not delivered or more information is delivered; that is an attempt at deception. A topic modeling algorithm is implemented to reveal the distribution of each topic embedded in a text. This study measures information amount as topic diversity based on the results of topic modeling, and topic diversity shows how heterogeneous a text review is. Two datasets of restaurant reviews on Yelp.com, which have Filtered (deceptive) and Unfiltered (genuine) reviews, were used to test the hypotheses. Reviews that contain more diverse topics tend to be truthful. However, excessive topic diversity produces an inverted U-shaped relationship with truthfulness. Moreover, we find an interaction effect between topic diversity and reviews' ratings. This result suggests that the impact of topic diversity is strengthened when deceptive reviews have lower ratings. This study contributes to the existing literature on IMT by building the connection between topic diversity in a review and its truthfulness. In addition, the empirical results show that topic diversity is a reliable measure for gauging information amount of reviews.

운율과 정보구조: 한국어 초점과 주제의 음성적 실현 (Prosody and Information Structure: Phonetic Realizations of Focus and Topic in Korean)

  • 오미라
    • 음성과학
    • /
    • 제15권2호
    • /
    • pp.7-19
    • /
    • 2008
  • Information structure can be conveyed by prosodic structure (Poser 1984 for Japanese; Inkelas and Leben 1990 for Hausa; Cho 1990 for Korean; Hayes and Lahiri 1991 for Bengali; Selkirk and Shen 1990 for Shanghai Chinese). Different subfields of linguistics and different theoretical perspectives suggest many distinct types of information structure: topic vs. comment, focus vs. background. old vs. new information, etc. The purpose of this paper is to investigate phonetic realizations of focus and topic among these information structures in Korean. For this purpose, we conduct a phonetic experiment where we examine duration, pitch and dephrasing in focus and topic structures. We make four findings through this study. First, duration of 'nun' varies depending on the information structure of the following constituent. Second, the degree of accentual phrase-initial rising is larger in contrastive topic and focused phrases than in neutral phrases. Third, a contrastive topic phrase always constitutes an Intonation Phrase on its own. Fourth, dephrasing occurs variously depending on gender and the number of the syllables within a phrase.

  • PDF

토픽 식별성 향상을 위한 키워드 재구성 기법 (Keyword Reorganization Techniques for Improving the Identifiability of Topics)

  • 윤여일;김남규
    • 한국IT서비스학회지
    • /
    • 제18권4호
    • /
    • pp.135-149
    • /
    • 2019
  • Recently, there are many researches for extracting meaningful information from large amount of text data. Among various applications to extract information from text, topic modeling which express latent topics as a group of keywords is mainly used. Topic modeling presents several topic keywords by term/topic weight and the quality of those keywords are usually evaluated through coherence which implies the similarity of those keywords. However, the topic quality evaluation method based only on the similarity of keywords has its limitations because it is difficult to describe the content of a topic accurately enough with just a set of similar words. In this research, therefore, we propose topic keywords reorganizing method to improve the identifiability of topics. To reorganize topic keywords, each document first needs to be labeled with one representative topic which can be extracted from traditional topic modeling. After that, classification rules for classifying each document into a corresponding label are generated, and new topic keywords are extracted based on the classification rules. To evaluated the performance our method, we performed an experiment on 1,000 news articles. From the experiment, we confirmed that the keywords extracted from our proposed method have better identifiability than traditional topic keywords.

An Ontology-Based Labeling of Influential Topics Using Topic Network Analysis

  • Kim, Hyon Hee;Rhee, Hey Young
    • Journal of Information Processing Systems
    • /
    • 제15권5호
    • /
    • pp.1096-1107
    • /
    • 2019
  • In this paper, we present an ontology-based approach to labeling influential topics of scientific articles. First, to look for influential topics from scientific article, topic modeling is performed, and then social network analysis is applied to the selected topic models. Abstracts of research papers related to data mining published over the 20 years from 1995 to 2015 are collected and analyzed in this research. Second, to interpret and to explain selected influential topics, the UniDM ontology is constructed from Wikipedia and serves as concept hierarchies of topic models. Our experimental results show that the subjects of data management and queries are identified in the most interrelated topic among other topics, which is followed by that of recommender systems and text mining. Also, the subjects of recommender systems and context-aware systems belong to the most influential topic, and the subject of k-nearest neighbor classifier belongs to the closest topic to other topics. The proposed framework provides a general model for interpreting topics in topic models, which plays an important role in overcoming ambiguous and arbitrary interpretation of topics in topic modeling.

Topic Analysis of Scholarly Communication Research

  • Ji, Hyun;Cha, Mikyeong
    • Journal of Information Science Theory and Practice
    • /
    • 제9권2호
    • /
    • pp.47-65
    • /
    • 2021
  • This study aims to identify specific topics, trends, and structural characteristics of scholarly communication research, based on 1,435 articles published from 1970 to 2018 in the Scopus database through Latent Dirichlet Allocation topic modeling, serial analysis, and network analysis. Topic modeling, time series analysis, and network analysis were used to analyze specific topics, trends, and structures, respectively. The results were summarized into three sets as follows. First, the specific topics of scholarly communication research were nineteen in number, including research resource management and research data, and their research proportion is even. Second, as a result of the time series analysis, there are three upward trending topics: Topic 6: Open Access Publishing, Topic 7: Green Open Access, Topic 19: Informal Communication, and two downward trending topics: Topic 11: Researcher Network and Topic 12: Electronic Journal. Third, the network analysis results indicated that high mean profile association topics were related to the institution, and topics with high triangle betweenness centrality, such as Topic 14: Research Resource Management, shared the citation context. Also, through cluster analysis using parallel nearest neighbor clustering, six clusters connected with different concepts were identified.

Topic Masks for Image Segmentation

  • Jeong, Young-Seob;Lim, Chae-Gyun;Jeong, Byeong-Soo;Choi, Ho-Jin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제7권12호
    • /
    • pp.3274-3292
    • /
    • 2013
  • Unsupervised methods for image segmentation are recently drawing attention because most images do not have labels or tags. A topic model is such an unsupervised probabilistic method that captures latent aspects of data, where each latent aspect, or a topic, is associated with one homogeneous region. The results of topic models, however, usually have noises, which decreases the overall segmentation performance. In this paper, to improve the performance of image segmentation using topic models, we propose two topic masks applicable to topic assignments of homogeneous regions obtained from topic models. The topic masks capture the noises among the assigned topic assignments or topic labels, and remove the noises by replacements, just like image masks for pixels. However, as the nature of topic assignments is different from image pixels, the topic masks have properties that are different from the existing image masks for pixels. There are two contributions of this paper. First, the topic masks can be used to reduce the noises of topic assignments obtained from topic models for image segmentation tasks. Second, we test the effectiveness of the topic masks by applying them to segmented images obtained from the Latent Dirichlet Allocation model and the Spatial Latent Dirichlet Allocation model upon the MSRC image dataset. The empirical results show that one of the masks successfully reduces the topic noises.

토픽 레이블링을 위한 토픽 키워드 산출 방법 (A Method of Calculating Topic Keywords for Topic Labeling)

  • 김은회;서유화
    • 디지털산업정보학회논문지
    • /
    • 제16권3호
    • /
    • pp.25-36
    • /
    • 2020
  • Topics calculated using LDA topic modeling have to be labeled separately. When labeling a topic, we look at the words that represent the topic, and label the topic. Therefore, it is important to first make a good set of words that represent the topic. This paper proposes a method of calculating a set of words representing a topic using TextRank, which extracts the keywords of a document. The proposed method uses Relevance to select words related to the topic with discrimination. It extracts topic keywords using the TextRank algorithm and connects keywords with a high frequency of simultaneous occurrence to express the topic with a higher coverage.