Search | Korea Science

Logistic Regression Ensemble Method for Extracting Significant Information from Social Texts (소셜 텍스트의 주요 정보 추출을 위한 로지스틱 회귀 앙상블 기법)

Kim, So Hyeon;Kim, Han Joon
- KIPS Transactions on Software and Data Engineering
- /
- v.6 no.5
- /
- pp.279-284
- /
- 2017
Currenty, in the era of big data, text mining and opinion mining have been used in many domains, and one of their most important research issues is to extract significant information from social media. Thus in this paper, we propose a logistic regression ensemble method of finding the main body text from blog HTML. First, we extract structural features and text features from blog HTML tags. Then we construct a classification model with logistic regression and ensemble that can decide whether any given tags involve main body text or not. One of our important findings is that the main body text can be found through 'depth' features extracted from HTML tags. In our experiment using diverse topics of blog data collected from the web, our tag classification model achieved 99% in terms of accuracy, and it recalled 80.5% of documents that have tags involving the main body text.
https://doi.org/10.3745/KTSDE.2017.6.5.279 인용 PDF KSCI

Rural Tourism Image and Major Activity Space in Gochang County Shown in Social Data - Focusing on the Keyword 'Gochang-gun Travel' - (소셜데이터에 나타난 고창군의 농촌관광 이미지와 주요 활동공간 - '고창군 여행' 키워드를 중심으로 -)

Kim, Young-Jin;Son, Gwangryul;Lee, Dongchae;Son, Yong-hoon
- Journal of Korean Society of Rural Planning
- /
- v.27 no.3
- /
- pp.103-116
- /
- 2021
In this study, the characteristics of rural tourism image perceived by urban residents were analyzed through text analysis of blog data. In order to examine the images related to rural tourism, blog data written with the keyword "Gochang-gun travel" was used. LDA topic analysis, one of the text mining techniques, was used for the analysis. In the tourism image of Gochang-gun, 9 topics were derived, and 112 major places appeared. This was divided into 3 main activities and 5 object spaces through the review of keywords and the original text of blog data. As a result of the analysis, the traditional main resources of the region, Seonun mountain, Seonun temple, and Gochang-eup fortress, formed topic. On the other hand, world heritage such as dolmen and Ungok wetland did not appear as topic. In particular, the farms operated by the private sector form individual topics, and the theme farm can be seen as an important resource for tourism in Gochang-gun. Also, through the distribution of place keywords, it was possible to understand the characteristics of travel by region and the usage behavior of visitors. In the case of Gochang-gun, there was a phenomenon in which visitors were biased by region. This seems to be the result of Gochang-gun seeking to vitalize local tourism focusing on natural, ecological, and scenic resources. It is necessary to establish a plan for balanced regional development and develop other types of tourism resources. This study is different in that it identified the types and characteristics of rural tourism images in the region perceived by visitors, and the status of tourism at the regional level.
https://doi.org/10.7851/ksrp.2021.27.3.103 인용 PDF KSCI

"Say Hello to Vietnam!": A Multimodal Analysis of British Travel Blogs

Thuy T.H. Tran
- SUVANNABHUMI
- /
- v.15 no.2
- /
- pp.91-129
- /
- 2023
This paper reports the findings of a multimodal study conducted on 10 travel blog posts about Vietnam by seven British professional travel bloggers. The study takes a sociolinguistic view to tourism by seeing travel blogs as a source for linguistic and other semiotic materials while considering language as situated practice for the social construction of fundamental categories such as "human," "society," and "nation." It borrows concepts from Halliday's Systemic Functional Linguistics for interpersonal metafunction to develop an analytical framework to study how the co-occurrence of text and still images in these travel blog posts formulated the portrayal of Vietnam as a tourism destination and indicated the main sociolinguistic features of the blogs. The analysis of appreciation values and interactive qualities encoded in evaluative adjectives and still images show that Vietnam is generally portrayed as a country of identity and diversity. It provides tourists with positive experiences in terms of places of interest, food and local lifestyles and is cost-competitive. Strangerhood and authenticity are two outstanding sociolinguistic features exhibited in these travel blog posts. The findings of this study also underline the co-contribution of the linguistic sign, in this case evaluative adjectives, and the visual sign, in this case still images, as interpersonal meaning-making resources. To portray Vietnam, still images served as integral elements to evidence the credibility of verbal narrations. To unveil sociolinguistic characteristics of travel blogs, still images supported the linguistic realizations of authenticity and strangerhood on the posts, and in some case delivered an even stronger message than words. Not only does the study present a source of feedback from international travelers to tourism practice in Vietnam, but it also provides insights into multimodal analysis of tourism discourse which remains an under-researched area in Vietnam.
https://doi.org/10.22801/svn.2023.15.2.91 인용 PDF

Biomass Productivity and its Vertical Allocation of Natural Pinus densiflora Forests by Stand Density (백두산 동북부지역 소나무 천연림에서 밀도에 따른 임분의 Biomass 생산성 및 수직 배분)

;Xianyu Meng
- Journal of Korea Foresty Energy
- /
- v.18 no.2
- /
- pp.92-99
- /
- 1999
This study was carried out to understand the primary production of biomass, vertical biomass distribution in the stand and the difference of biomass production for part of the trees by stand density for natural Pinus densiflora forest at Mt. Baekdoo located in northeastern China. The primary production of biomass was estimated by the layers of trees, shrubs, herbs for five density classes. For the biomass estimation of the Pinus densiflora trees in stern, stembark and the above-ground tree, the regression model of logW = a + blog(D$^2$H) + c(D$^2$H) was adapted for all of the density classes where W is dry weight, D$_1$ diameter at breast height, and H, tree height. For the biomass estimation in branch and needle, and the needle area, the regression model of logW=a+blogD+cD was adapted for all of the density classes. With increasing stand density the biomass of trees increased but that of shrubs and herbs decreased. Net primary production of biomass in parts of the tree also increased with increasing stand density. However the percentage of the needle biomass among the total biomass in the above-ground tree decreased with increasing stand density. Consequently, primary production rate of biomass in the above-ground tree increased. The primary production of biomass for each part of the trees in natural Pinus densiflora natural forests showed in descending order : stern, needle, branch, and stembark regardless of stand density.
PDF

A Study on Detecting Fake Reviews Using Machine Learning: Focusing on User Behavior Analysis (머신러닝을 활용한 가짜리뷰 탐지 연구: 사용자 행동 분석을 중심으로)

Lee, Min Cheol;Yoon, Hyun Shik
- Knowledge Management Research
- /
- v.21 no.3
- /
- pp.177-195
- /
- 2020
The social consciousness on fake reviews has triggered researchers to suggest ways to cope with them by analyzing contents of fake reviews or finding ways to discover them by means of structural characteristics of them. This research tried to collect data from blog posts in Naver and detect habitual patterns users use unconsciously by variables extracted from blogs and blog posts by a machine learning model and wanted to use the technique in predicting fake reviews. Data analysis showed that there was a very high relationship between the number of all the posts registered in the blog of the writer of the related writing and the date when it was registered. And, it was found that, as model to detect advertising reviews, Random Forest is the most suitable. If a review is predicted to be an advertising one by the model suggested in this research, it is very likely that it is fake review, and that it violates the guidelines on investigation into markings and advertising regarding recommendation and guarantee in the Law of Marking and Advertising. The fact that, instead of using analysis of morphemes in contents of writings, this research adopts behavior analysis of the writer, and, based on such an approach, collects characteristic data of blogs and blog posts not by manual works, but by automated system, and discerns whether a certain writing is advertising or not is expected to have positive effects on improving efficiency and effectiveness in detecting fake reviews.
https://doi.org/10.15813/kmr.2020.21.3.010 인용 PDF KSCI

A Wikipedia-based Feedback Method for In-depth Blog Distillation (주제를 깊이 있게 다루는 블로그 피드 검색을 위한 위키피디아 기반 질의 확장 방법)

Song, Woo-Sang;Lee, Ye-Ha;Lee, Jong-Hyeok
- Proceedings of the Korean Information Science Society Conference
- /
- 2010.06a
- /
- pp.92-93
- /
- 2010
PDF

Intelligent Semantic Blog Modelling for Supporting Analysis Queries to Recommend Interest Communities (관심 커뮤니티 추천을 위한 분석 질의를 지원하는 지능형 시맨틱 블로그 모델링)

Yang, Kyung-Ah;Yang, Jae-Dong;Li, Ma-Jian
- Proceedings of the Korean Information Science Society Conference
- /
- 2007.06a
- /
- pp.89-90
- /
- 2007
PDF

Study of for factors influencing blog activity;Focused on Motivation-Hygiene theory (블로그 활동에 영향을 미치는 요인들에 관한 연구;Herzberg의 동기-위생 이론을 중심으로)

Kim, Tae-Won;Kim, Sang-Uk
- Proceedings of the Korea Society of Information Technology Applications Conference
- /
- 2007.11a
- /
- pp.89-89
- /
- 2007
PDF

블로그 전도사 이람 NHN 커뮤니티팀장- 행복한 기성복 재단사 될래요”

Sin, Seung-Cheol
- Digital Contents
- /
- no.3 s.130
- /
- pp.22-23
- /
- 2004
요즘 네이버 커뮤니티 서비스가 몰라보게 달라졌다는 소식 을 심심찮게 접할 수 있다. 이람 커뮤니티 팀장이 NHN으로 옮기면서 나온 이야기다. 1억원대 연봉자라는 이야기도 들린다. 네티즌들 사이에서 블로그(blog)가‘1인 미디어’라는 하나의 새로운 문화코드로 자리잡고 있는 이즈음, 이람 NHN 커뮤니티팀장을 만나 자세한 이야기를 들어봤다.
PDF

Automatic Correction of Errors in Annotated Corpus Using Kernel Ripple-Down Rules (커널 Ripple-Down Rule을 이용한 태깅 말뭉치 오류 자동 수정)

Park, Tae-Ho;Cha, Jeong-Won
- Journal of KIISE
- /
- v.43 no.6
- /
- pp.636-644
- /
- 2016
Annotated Corpus is important to understand natural language using machine learning method. In this paper, we propose a new method to automate error reduction of annotated corpora. We use the Ripple-Down Rules(RDR) for reducing errors and Kernel to extend RDR for NLP. We applied our system to the Korean Wikipedia and blog corpus errors to find the annotated corpora error type. Experimental results with various views from the Korean Wikipedia and blog are reported to evaluate the effectiveness and efficiency of our proposed approach. The proposed approach can be used to reduce errors of large corpora.
https://doi.org/10.5626/JOK.2016.43.6.636 인용 KSCI

Search Result 444, Processing Time 0.029 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)