• Title/Summary/Keyword: 블로그 마이닝

Search Result 76, Processing Time 0.023 seconds

Logistic Regression Ensemble Method for Extracting Significant Information from Social Texts (소셜 텍스트의 주요 정보 추출을 위한 로지스틱 회귀 앙상블 기법)

  • Kim, So Hyeon;Kim, Han Joon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.5
    • /
    • pp.279-284
    • /
    • 2017
  • Currenty, in the era of big data, text mining and opinion mining have been used in many domains, and one of their most important research issues is to extract significant information from social media. Thus in this paper, we propose a logistic regression ensemble method of finding the main body text from blog HTML. First, we extract structural features and text features from blog HTML tags. Then we construct a classification model with logistic regression and ensemble that can decide whether any given tags involve main body text or not. One of our important findings is that the main body text can be found through 'depth' features extracted from HTML tags. In our experiment using diverse topics of blog data collected from the web, our tag classification model achieved 99% in terms of accuracy, and it recalled 80.5% of documents that have tags involving the main body text.

Determining Diffusion Power Users in a Blog Network (블로그 연결망에서 파급력을 가진 파워 유저의 파악 기법)

  • Lim, Seung-Hwan;Kim, Sang-Wook;Park, Sun-Ju
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.4
    • /
    • pp.377-382
    • /
    • 2010
  • For business purposes, it is important to identify diffusion power users, a group of users who have big influence on other users in diffusing content. In this paper, we use the independent cascade model for determining diffusion power users, and to do so, we need a method for calculating the assimilation probability between users. This paper proposes the concepts of user delivery power and a way to quantifying the value of this. User delivery power is used to compute the assimilation probability with user content power. We analyze the proposed method by comparing its performance with those of existing methods through experiments using a real blog network data.

Construction of a Blog Network based on Information Diffusion (정보 파급 모델링을 위한 블로그 네트워크 구성)

  • Lim, Seung-Hwan;Kim, Sang-Wook;Kang, Kyu-Hwang;Do, Young-Joo
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.15 no.11
    • /
    • pp.841-845
    • /
    • 2009
  • The independent cascade model has been widely used to analyze information diffusion in the blog world. In this paper, we propose a new method to construct a blog network for applying the independent cascade model to analyzing of information diffusion in a blog world. To construct a blog network, the proposed method establishes the edge between two users and calculates diffusion probabilities between them by analyzing the activities happened between two users. To calculate diffusion probabilities, the method exploits the ratio of the number of documents actually diffused to a specific user to that of documents written for the purpose of being diffused to other blogs. The experimental result using a real world blog data demonstrates that our method reflects actual information diffusion in a blog world better than existing ones.

An Information Diffusion Model Considering Non-explicit Relationships in the Blog World (블로그 월드에서 비명시적 관계를 고려한 정보 파급 모델)

  • Kwon, Yong-Suk;Kim, Sang-Wook;Park, Sun-Ju;Lim, Seung-Hwan;Lee, Jae-Bum
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.15 no.5
    • /
    • pp.360-364
    • /
    • 2009
  • Analyzing information diffusion in a blog world is a very useful research issue, which can be used for predicting information diffusion, abnormally detection, marketing, and revitalizing the blog world. Existing studies on information diffusion in blog networks establish explicit relationship between blogs, and analyze only the word-of-mouth effect through such explicit relationships. However, we observed that more than 85% of all information diffusion in a blog world occurs through non-explicit relationships. In this paper, we propose a new model that considers both explicit and non-explicit relationships between blogs in order to explain all information diffusion phenomena in a blog world. We verify the superiority of our proposed models through extensive experiments of information diffusions at a real blog net-work.

Application of Sentiment Analysis and Topic Modeling on Rural Solar PV Issues : Comparison of News Articles and Blog Posts (감성분석과 토픽모델링을 활용한 농촌태양광 관련 이슈 연구 : 언론 기사와 블로그 포스트 비교)

  • Ki, Jaehong;Ahn, Seunghyeok
    • Journal of Digital Convergence
    • /
    • v.18 no.9
    • /
    • pp.17-27
    • /
    • 2020
  • News articles and blog posts have influence on social agenda setting and this study applied text mining on the subject of solar PV in rural area appeared in those media. Texts are gained from online news articles and blog posts with rural solar PV as a keyword by web scrapping, and these are analysed by sentiment analysis and topic modeling technique. Sentiment analysis shows that the proportion of negative texts are significantly lower in blog posts compared to news articles. Result of topic modeling shows that topics related to government policy have the largest loading in positive articles whereas various topics are relatively evenly distributed in negative articles. For blog posts, topics related to rural area installation and environmental damage are have the largest loading in positive and negative texts, respectively. This research reveals issues related to rural solar PV by combining sentiment analysis and topic modeling that were separately applied in previous studies.

Determining Contents Power Users for Revitalizing Blog Networks (블로그 연결망 활성화를 위한 컨텐츠 파워 유저의 파악 방안)

  • Lim, Seung-Hwan;Kim, Sang-Wook;Park, Sun-Ju;Lee, Joon-Ho
    • Journal of KIISE:Databases
    • /
    • v.36 no.6
    • /
    • pp.411-421
    • /
    • 2009
  • In a blog network, there are special users who induce other users to actively utilize blog services. In this paper, these users whose contents exhibit large influence over other bloggers are defined as 'Content Power Users' (CPUs). It is important to accurately determine who content power users are in a blog network in order to establish business policies that will stimulate usage of blog services. In this paper, we discuss a novel method of determining content power users. First, we propose a system of measuring the influence of content of each post owned by individual users. Then, by adjusting the measured values based on the time of exposure and adding them up, we calculate the power of influence for corresponding users. Finally, by applying the proposed method to actual blog networks and comparing the selected power users to those of a preexisting method, we analyze different methods of determining power users. The experimental results demonstrate that our method of determining power users reflects well dynamic changes in a blog network.

Box Office Hit Prediction Using Data mining and Text mining (데이터마이닝과 텍스트마이닝을 활용한 영화 흥행 예측)

  • Jo, Hyo-jung
    • Annual Conference of KIPS
    • /
    • 2021.05a
    • /
    • pp.316-318
    • /
    • 2021
  • 영화 수익에 있어 영화의 흥행 여부는 중요한 영향을 끼친다. 영화 흥행 요인은 영화 산업의 규모가 커지면서 많은 제작사들 및 투자자들이 고려해야 하는 사항이 되었다. 따라서 영화의 흥행을 예측하기 위한 많은 모델이 연구되었다. 본 연구의 목적은 선행연구에서 흥행에 유의미한 영향을 끼친다고 밝혀진 스크린 수, 감독명, 제작사명 등의 내재적인 속성과 더불어 온라인 구전 변수를 사용하여 영화 흥행 예측 모델을 만드는 것이다. 이때 기사 수, 블로그 수와 같이 온라인 구전의 크기를 나타내는 변수들을 사용하는 대신 개봉 후 첫 주간의 관람객 리뷰를 텍스트마이닝을 이용하여 전체 리뷰 중 긍정 리뷰의 비율에 따라 점수를 매긴 후 독립변수로 사용한다. 그 후, 데이터 마이닝 기법을 활용하여 만든 모델에 앞서 언급한 독립변수를 입력 값으로 사용하여 영화의 흥행을 예측한다. 최종적으로 의사결정트리와 로지스틱회귀를 수행한 결과 영화 흥행에 영향을 주는 독립변수를 찾고 모델의 성능을 평가하였다. 로지스틱회귀의 결과 관객 수, 평점이 영화의 흥행에 특히 유의한 영향을 끼치는 변수로 선정되었고 리뷰 역시 유의한 변수로 선정되었다. 이때 만들어진 모델은 약 90%의 높은 수준의 정확도를 보여주었다. 의사결정트리의 결과 관객 수가 가장 중요한 변수로 선정되었다.

Extracting Significant Information from Social Text using Machine Learning (기계학습을 활용한 소셜 텍스트의 주요 정보 추출 기법)

  • Kim, So-Hyeon;Kim, Han-joon
    • Annual Conference of KIPS
    • /
    • 2016.10a
    • /
    • pp.742-745
    • /
    • 2016
  • 빅데이터 시대를 맞이하여 텍스트마이닝과 오피니언마이닝의 활용도가 커지고 있는 시점에서 소셜 네트워크 데이터로부터 유용한 데이터를 추출하는 작업은 매우 중요하다. 이에 본 논문은 블로그 HTML 문서에서 추출한 태그 특징에 로지스틱 회귀 및 앙상블 기법을 적용하여 본문을 포함하는 태그를 분류하는 모델을 구성한 뒤 태그의 깊이 특징을 이용하여 주요 본문을 찾는 방법을 제안한다. 직접 수집한 데이터를 이용한 실험에서 태그 분류 정확도가 0.990, 본문을 찾아낸 문서의 비율이 80.5%로 나왔다.

A Technique for Making Efficient Travel Routes using the Mining Method of Frequent Patterns-growth (FP-growth 마이닝을 이용한 효율적인 여행경로 수립 기법)

  • Yoo, Kibeom;Cho, Kyungsoo;Kim, Ung-Mo
    • Annual Conference of KIPS
    • /
    • 2010.11a
    • /
    • pp.10-13
    • /
    • 2010
  • 컴퓨터의 활용이 다양해 지면서 예전과 다르게 다양한 이유로 많은 사람들이 여행을 하고 나서 여행에 대한 정보 블로그나 웹 상에 저장하고 공개한다. 이렇게 웹 상에 많은 양의 여행 관련 데이터가 존재함에도 불구하고 데이터들이 산발적으로 존재하고 체계적으로 데이터 베이스화 되어 있지 않아서 여전히 정보를 검색하고 여행 일정을 세우는 데에 많은 시간과 노력이 필요하다. 따라서 본 논문은 FP-tree 기반의 빈발 패턴 증가 기법을 이용한 여행 계획 수립 기법을 제안한다. 제안되는 기법에서 데이터들은 FP-tree 방식으로 저장되어 검색에 필요한 시간과 노력을 극적으로 줄이고, FP-growth 마이닝 기법을 이용해 효과적인 여행 경로를 선택할 수 있게 도와준다.

The Analysis of the Visitors' Experiences in Yeonnam-dong before and after the Gyeongui Line Park Project - A Text Mining Approach - (경의선숲길 조성 전후의 연남동 방문자의 경험 분석 - 블로그 텍스트 분석을 중심으로 -)

  • Kim, Sae-Ryung;Choi, Yunwon;Yoon, Heeyeun
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.47 no.4
    • /
    • pp.33-49
    • /
    • 2019
  • The purpose of this study was to investigate the changes in the experiences of visitors of Yeonnam-dong during the period covering the development of a linear park, the Gyeongui Line Park. This study used a text mining technique to analyze Naver Blog postings of those who visited Yeonnam-dong from June 2013 to May 2017, divided into four periods -from June 2013 to May 2014, from June 2014 to May 2015, from June 2015 to May 2016 and from June 2016 to May 2017. The keywords used were 'Yeonnam-dong', 'Gyeongui Line' and 'Yeontral Park' and the data was further refined and resampled. A semantic network analysis was conducted on the basis of the co-occurrences of words. The results of the study were as follows. During the entire period, the main experience of visitors to Yeonnam-dong was 'food culture' consistently, but the activities related to 'market', 'browsing', and 'buy' increased. Also, activities such as 'walk', 'play' and 'rest' in the park newly appeared after the construction of the park. Moreover, more diverse opinions about the Yeonnam-dong were expressed on the blog, and Yeonnam-dong began to be recognized as a place where a variety of activities can be enjoyed. Lastly, when the visitors wrote about the theme 'food culture', the scope of the keywords expanded from simple ones, such as 'eat', 'photograph' and 'chatting' to 'market', 'browsing', and 'walk'. The sub-themes that appeared with the park also expanded to various topics with the emergence of the Gyeongui Line Book Street. This study analyzed the change of experiences of visitors objectively with text mining, a quantitative methodology. Due to the nature of text mining, however, the subjective opinions inevitably have been involved in the process of refining. Also, further research is required to assess the direct relationship between these changes and park construction.