• Title/Summary/Keyword: graph mining

Search Result 105, Processing Time 0.027 seconds

A Reply Graph-based Social Mining Method with Topic Modeling (토픽 모델링을 이용한 댓글 그래프 기반 소셜 마이닝 기법)

  • Lee, Sang Yeon;Lee, Keon Myung
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.24 no.6
    • /
    • pp.640-645
    • /
    • 2014
  • Many people use social network services as to communicate, to share an information and to build social relationships between others on the Internet. Twitter is such a representative service, where millions of tweets are posted a day and a huge amount of data collection has been being accumulated. Social mining that extracts the meaningful information from the massive data has been intensively studied. Typically, Twitter easily can deliver and retweet the contents using the following-follower relationships. Topic modeling in tweet data is a good tool for issue tracking in social media. To overcome the restrictions of short contents in tweets, we introduce a notion of reply graph which is constructed as a graph structure of which nodes correspond to users and of which edges correspond to existence of reply and retweet messages between the users. The LDA topic model, which is a typical method of topic modeling, is ineffective for short textual data. This paper introduces a topic modeling method that uses reply graph to reduce the number of short documents and to improve the quality of mining results. The proposed model uses the LDA model as the topic modeling framework for tweet issue tracking. Some experimental results of the proposed method are presented for a collection of Twitter data of 7 days.

An Approach for Generating Story-Plot Using Association Analysis of Narrative Patterns (서사 패턴의 연관분석을 통한 이야기 장면 생성 방법)

  • Kim, Jung-Il;Lee, Eun-Joo
    • Journal of Information Technology Services
    • /
    • v.12 no.1
    • /
    • pp.247-257
    • /
    • 2013
  • A narrative structure is essential for a story generator to create a story plot. In digital storytelling system, a narrative structure can be generally designed as a tree or a graph, and the story generator in the digital storytelling system creates continuous story plots based on the narrative structure. When a narrative structure is designed with a tree or a graph, it is hard for the story generator to create various kinds of story-plots due to the inflexible nature of a tree or graph structure. It may result in degrading the quality of story-plots to provide similar story-plot to various kind of user. In this paper, we proposed an approach to create a story-plot based on association analysis of data mining to overcome the disadvantage. In detail, we defined a narrative structure which consists of narrative patterns, and then implemented a story generator which creates a story-plot using the proposed narrative structure. As a result, we confirmed that implemented story generator was able to create a story-plot according to understanding level of user in case study.

The effect investigation of the delirium by Bayesian network and radial graph (베이지안 네트워크와 방사형 그래프를 이용한 섬망의 효과 규명)

  • Lee, Jea-Young;Bae, Jae-Young
    • Journal of the Korean Data and Information Science Society
    • /
    • v.22 no.5
    • /
    • pp.911-919
    • /
    • 2011
  • In recent medical analysis, it becomes more important to looking for risk factors related to mental illness. If we find and identify their relevant characteristics of the risk factors, the disease can be prevented in advance. Moreover, the study can be helpful to medical development. These kinds of studies of risk factors for mental illness have mainly been discussed by using the logistic regression model. However in this paper, data mining techniques such as CART, C5.0, logistic, neural networks and Bayesian network were used to search for the risk factors. The Bayesian network of the above data mining methods was selected as most optimal model by applying delirium data. Then, Bayesian network analysis was used to find risk factors and the relationship between the risk factors are identified through a radial graph.

Contribution to Improve Database Classification Algorithms for Multi-Database Mining

  • Miloudi, Salim;Rahal, Sid Ahmed;Khiat, Salim
    • Journal of Information Processing Systems
    • /
    • v.14 no.3
    • /
    • pp.709-726
    • /
    • 2018
  • Database classification is an important preprocessing step for the multi-database mining (MDM). In fact, when a multi-branch company needs to explore its distributed data for decision making, it is imperative to classify these multiple databases into similar clusters before analyzing the data. To search for the best classification of a set of n databases, existing algorithms generate from 1 to ($n^2-n$)/2 candidate classifications. Although each candidate classification is included in the next one (i.e., clusters in the current classification are subsets of clusters in the next classification), existing algorithms generate each classification independently, that is, without taking into account the use of clusters from the previous classification. Consequently, existing algorithms are time consuming, especially when the number of candidate classifications increases. To overcome the latter problem, we propose in this paper an efficient approach that represents the problem of classifying the multiple databases as a problem of identifying the connected components of an undirected weighted graph. Theoretical analysis and experiments on public databases confirm the efficiency of our algorithm against existing works and that it overcomes the problem of increase in the execution time.

Stability Assessment of Underground Limestone Mine Openings by Stability Graph Method (Stability graph method에 의한 석회석 지하채굴 공동의 안정성 평가)

  • Sunwoo Choon;Jung Yong-Bok
    • Tunnel and Underground Space
    • /
    • v.15 no.5 s.58
    • /
    • pp.369-377
    • /
    • 2005
  • The stability of underground openings is a major concern for the safety and productivity of mining operations. Rock mass classification methods provide the basis of many empirical design methods as well as a basis for numerical analysis. Of the many factors which influence the stability of openings, the span of the opening for a given rock mass condition provides an important parameter of design. In this paper, the critical span curves proposed by Lang, the Mathews stability graph method and the modified critical span curve suggested by the authors have been assessed. The modified critical span curve was proposed by using Mathews stability graph method. The modified critical span curve by the author have been used to assess the stability of underground openings in several limestone mines.

Inferring Undiscovered Public Knowledge by Using Text Mining-driven Graph Model (텍스트 마이닝 기반의 그래프 모델을 이용한 미발견 공공 지식 추론)

  • Heo, Go Eun;Song, Min
    • Journal of the Korean Society for information Management
    • /
    • v.31 no.1
    • /
    • pp.231-250
    • /
    • 2014
  • Due to the recent development of Information and Communication Technologies (ICT), the amount of research publications has increased exponentially. In response to this rapid growth, the demand of automated text processing methods has risen to deal with massive amount of text data. Biomedical text mining discovering hidden biological meanings and treatments from biomedical literatures becomes a pivotal methodology and it helps medical disciplines reduce the time and cost. Many researchers have conducted literature-based discovery studies to generate new hypotheses. However, existing approaches either require intensive manual process of during the procedures or a semi-automatic procedure to find and select biomedical entities. In addition, they had limitations of showing one dimension that is, the cause-and-effect relationship between two concepts. Thus;this study proposed a novel approach to discover various relationships among source and target concepts and their intermediate concepts by expanding intermediate concepts to multi-levels. This study provided distinct perspectives for literature-based discovery by not only discovering the meaningful relationship among concepts in biomedical literature through graph-based path interference but also being able to generate feasible new hypotheses.

A Distributed Vertex Rearrangement Algorithm for Compressing and Mining Big Graphs (대용량 그래프 압축과 마이닝을 위한 그래프 정점 재배치 분산 알고리즘)

  • Park, Namyong;Park, Chiwan;Kang, U
    • Journal of KIISE
    • /
    • v.43 no.10
    • /
    • pp.1131-1143
    • /
    • 2016
  • How can we effectively compress big graphs composed of billions of edges? By concentrating non-zeros in the adjacency matrix through vertex rearrangement, we can compress big graphs more efficiently. Also, we can boost the performance of several graph mining algorithms such as PageRank. SlashBurn is a state-of-the-art vertex rearrangement method. It processes real-world graphs effectively by utilizing the power-law characteristic of the real-world networks. However, the original SlashBurn algorithm displays a noticeable slowdown for large-scale graphs, and cannot be used at all when graphs are too large to fit in a single machine since it is designed to run on a single machine. In this paper, we propose a distributed SlashBurn algorithm to overcome these limitations. Distributed SlashBurn processes big graphs much faster than the original SlashBurn algorithm does. In addition, it scales up well by performing the large-scale vertex rearrangement process in a distributed fashion. In our experiments using real-world big graphs, the proposed distributed SlashBurn algorithm was found to run more than 45 times faster than the single machine counterpart, and process graphs that are 16 times bigger compared to the original method.

Graph Processing on the Web Environment (웹 환경에서의 그래프 처리)

  • 박성헌;박지헌
    • The Journal of Society for e-Business Studies
    • /
    • v.5 no.2
    • /
    • pp.113-125
    • /
    • 2000
  • There are many web-based applications which need graphs and charts to be generated from data stored in the database. This paper does a comparative study on graph processing techniques for web-based applications through a case study of building a stock information system. The result of this paper can be used for building effective web applications with graphs in areas of EC(electronic commerce), EIS(executive information system), and DM(data mining).

  • PDF

A Dependency Graph-Based Keyphrase Extraction Method Using Anti-patterns

  • Batsuren, Khuyagbaatar;Batbaatar, Erdenebileg;Munkhdalai, Tsendsuren;Li, Meijing;Namsrai, Oyun-Erdene;Ryu, Keun Ho
    • Journal of Information Processing Systems
    • /
    • v.14 no.5
    • /
    • pp.1254-1271
    • /
    • 2018
  • Keyphrase extraction is one of fundamental natural language processing (NLP) tools to improve many text-mining applications such as document summarization and clustering. In this paper, we propose to use two novel techniques on the top of the state-of-the-art keyphrase extraction methods. First is the anti-patterns that aim to recognize non-keyphrase candidates. The state-of-the-art methods often used the rich feature set to identify keyphrases while those rich feature set cover only some of all keyphrases because keyphrases share very few similar patterns and stylistic features while non-keyphrase candidates often share many similar patterns and stylistic features. Second one is to use the dependency graph instead of the word co-occurrence graph that could not connect two words that are syntactically related and placed far from each other in a sentence while the dependency graph can do so. In experiments, we have compared the performances with different settings of the graphs (co-occurrence and dependency), and with the existing method results. Finally, we discovered that the combination method of dependency graph and anti-patterns outperform the state-of-the-art performances.

Combining Local and Global Features to Reduce 2-Hop Label Size of Directed Acyclic Graphs

  • Ahn, Jinhyun;Im, Dong-Hyuk
    • Journal of Information Processing Systems
    • /
    • v.16 no.1
    • /
    • pp.201-209
    • /
    • 2020
  • The graph data structure is popular because it can intuitively represent real-world knowledge. Graph databases have attracted attention in academia and industry because they can be used to maintain graph data and allow users to mine knowledge. Mining reachability relationships between two nodes in a graph, termed reachability query processing, is an important functionality of graph databases. Online traversals, such as the breadth-first and depth-first search, are inefficient in processing reachability queries when dealing with large-scale graphs. Labeling schemes have been proposed to overcome these disadvantages. The state-of-the-art is the 2-hop labeling scheme: each node has in and out labels containing reachable node IDs as integers. Unfortunately, existing 2-hop labeling schemes generate huge 2-hop label sizes because they only consider local features, such as degrees. In this paper, we propose a more efficient 2-hop label size reduction approach. We consider the topological sort index, which is a global feature. A linear combination is suggested for utilizing both local and global features. We conduct experiments over real-world and synthetic directed acyclic graph datasets and show that the proposed approach generates smaller labels than existing approaches.