• Title/Summary/Keyword: web crawling

Search Result 176, Processing Time 0.027 seconds

An Exploratory Analysis on the User Response Pattern and Quality Characteristics of Marketing Contents in the SNS of Regional Government (지역마케팅 콘텐츠의 사용자 반응패턴과 품질특성에 관한 탐색적 분석: 지방자치단체가 운영하는 SNS를 중심으로)

  • Jeong, Yeon-Su;Jeong, Dae-Yul
    • The Journal of Information Systems
    • /
    • v.26 no.4
    • /
    • pp.419-442
    • /
    • 2017
  • Purpose The purpose of this study is to explore the pattern of user response and it's duration time through social media content response analysis. We also analyze the characteristics of content quality factors which are associate with the user response pattern. The analysis results will provide some implications to develop strategies and schematic plans for the operator of regional marketing on the SNS. Design/methodology/approach This study used mixed methods to verify the effects and responses of social media contents on the users who have concerns about regional events such as local festival, cultural events, and city tours etc. Big data analysis was conducted with the quantitative data from regional government SNSs. The data was collected through web crawling in order to analyze the social media contents. We especially analyzed the contents duration time and peak level time. This study also analyzed the characteristics of contents quality factors using expert evaluation data on the social media contents. Finally, we verify the relationship between the contents quality factors and user response types by cross correlation analysis. Findings According to the big data analysis, we could find some content life cycle which can be explained through empirical distribution with peak time pattern and left skewed long tail. The user response patterns are dependent on time and contents quality. In addition, this study confirms that the level of quality of social media content is closely relate to user interaction and response pattern. As a result of the contents response pattern analysis, it is necessary to develop high quality contents design strategy and content posting and propagation tactics. The SNS operators need to develop high quality contents using rich-media technology and active response contents that induce opinion leader on the SNS.

A Study on the Performance of Deep learning-based Automatic Classification of Forest Plants: A Comparison of Data Collection Methods (데이터 수집방법에 따른 딥러닝 기반 산림수종 자동분류 정확도 변화에 관한 연구)

  • Kim, Bomi;Woo, Heesung;Park, Joowon
    • Journal of Korean Society of Forest Science
    • /
    • v.109 no.1
    • /
    • pp.23-30
    • /
    • 2020
  • The use of increased computing power, machine learning, and deep learning techniques have dramatically increased in various sectors. In particular, image detection algorithms are broadly used in forestry and remote sensing areas to identify forest types and tree species. However, in South Korea, machine learning has rarely, if ever, been applied in forestry image detection, especially to classify tree species. This study integrates the application of machine learning and forest image detection; specifically, we compared the ability of two machine learning data collection methods, namely image data captured by forest experts (D1) and web-crawling (D2), to automate the classification of five trees species. In addition, two methods of characterization to train/test the system were investigated. The results indicated a significant difference in classification accuracy between D1 and D2: the classification accuracy of D1 was higher than that of D2. In order to increase the classification accuracy of D2, additional data filtering techniques were required to reduce the noise of uncensored image data.

A Topic Modeling Analysis for Online News Article Comments on Nurses' Workplace Bullying (간호사의 직장 내 괴롭힘 관련 온라인 뉴스기사 댓글에 대한 토픽 모델링 분석)

  • Kang, Jiyeon;Kim, Soogyeong;Roh, Seungkook
    • Journal of Korean Academy of Nursing
    • /
    • v.49 no.6
    • /
    • pp.736-747
    • /
    • 2019
  • Purpose: This study aimed to explore public opinion on workplace bullying in the nursing field, by analyzing the keywords and topics of online news comments. Methods: This was a text-mining study that collected, processed, and analyzed text data. A total of 89,951 comments on 650 online news articles, reported between January 1, 2013 and July 31, 2018, were collected via web crawling. The collected unstructured text data were preprocessed and keyword analysis and topic modeling were performed using R programming. Results: The 10 most important keywords were "work" (37121.7), "hospital" (25286.0), "patients" (24600.8), "woman" (24015.6), "physician" (20840.6), "trouble" (18539.4), "time" (17896.3), "money" (16379.9), "new nurses" (14056.8), and "salary" (13084.1). The 22,572 preprocessed key words were categorized into four topics: "poor working environment", "culture among women", "unfair oppression", and "society-level solutions". Conclusion: Public interest in workplace bullying among nurses has continued to increase. The public agreed that negative work environment and nursing shortage could cause workplace bullying. They also considered nurse bullying as a problem that should be resolved at a societal level. It is necessary to conduct further research through gender discrimination perspectives on nurse workplace bullying and the social value of nursing work.

A Topic Analysis of SW Education Textdata Using R (R을 활용한 SW교육 텍스트데이터 토픽분석)

  • Park, Sunju
    • Journal of The Korean Association of Information Education
    • /
    • v.19 no.4
    • /
    • pp.517-524
    • /
    • 2015
  • In this paper, to find out the direction of interest related to the SW education, SW education news data were gathered and its contents were analyzed. The topic analysis of SW education news was performed by collecting the data of July 23, 2013 to October 19, 2015. By analyzing the relationship among the most mentioned top 20 words with the web crawling using R, the result indicated that the 20 words are the closely relevant data as the thickness of the node size of the 20 words was balancing each other in the co-occurrence matrix graph focusing on the 'SW education' word. Moreover, our analysis revealed that the data were mainly composed of the topics about SW talent, SW support Program, SW educational mandate, SW camp, SW industry and the job creation. This could be used for big data analysis to find out the thoughts and interests of such people in the SW education.

Comparison of Online Shopping Mall BEST 100 using Exploratory Data Analysis (탐색적 자료 분석(EDA) 기법을 활용한 국내 11개 대표 온라인 쇼핑몰 BEST 100 비교)

  • Kang, Jicheon;Kang, Juyoung
    • The Journal of Bigdata
    • /
    • v.3 no.1
    • /
    • pp.1-12
    • /
    • 2018
  • Since the beginning of the first online shopping mall, BEST 100 is being provided as the core of all shopping mall websites. BEST 100 is greatly important because consumers can identify popular products at a glance. However, there are only studies using sales outcome indicators, and prior studies using BEST 100 are insignificant. Therefore, this study selected 11 online shopping malls and compared their main characteristics. As a research method, exploratory data analysis technique (EDA) was used by crawling the BEST 100 components of each shopping mall website, such as product name, price, and free shipping check. As a result, the total average price of 11 shopping malls was 72,891.41 won. Sales texts were classified into 8 categories by text mining. The most common category was the fashion part, but it is significant that the setting of the category analyzed the marketing text, not the product attribute. This study has implications for understanding the current online market flow and suggesting future directions by using EDA.

Tax Judgment Analysis and Prediction using NLP and BiLSTM (NLP와 BiLSTM을 적용한 조세 결정문의 분석과 예측)

  • Lee, Yeong-Keun;Park, Koo-Rack;Lee, Hoo-Young
    • Journal of Digital Convergence
    • /
    • v.19 no.9
    • /
    • pp.181-188
    • /
    • 2021
  • Research and importance of legal services applied with AI so that it can be easily understood and predictable in difficult legal fields is increasing. In this study, based on the decision of the Tax Tribunal in the field of tax law, a model was built through self-learning through information collection and data processing, and the prediction results were answered to the user's query and the accuracy was verified. The proposed model collects information on tax decisions and extracts useful data through web crawling, and generates word vectors by applying Word2Vec's Fast Text algorithm to the optimized output through NLP. 11,103 cases of information were collected and classified from 2017 to 2019, and verified with 70% accuracy. It can be useful in various legal systems and prior research to be more efficient application.

Goal Gradient Effect in Reward-based Crowdfunding; Difference in Project Category (후원형 크라우드 펀딩에서의 목표 구배 효과; 프로젝트 카테고리 별 차이를 중심으로)

  • Hwang, Ji Hyeon;Choi, Kang Jun;Lee, Jae Young;Soh, Seung Bum
    • Knowledge Management Research
    • /
    • v.20 no.3
    • /
    • pp.173-193
    • /
    • 2019
  • Reward-based crowdfunding is a funding platform that allows funds to be raised to early operators who have lack of funds, and is seen as an outstanding infrastructure that is going to lead the fourth industrial revolution in that it is a field of realization of new technologies and creative ideas by start-ups. Reward-based crowdfunding has grown in line with the trend of the fourth industrial revolution, and funding success cases are taking place in various industries that culture/art to technology/IT, including as a new means of knowledge management in a rapidly changing industrial environment. The study focused on the fact that consumer's donation purposes may also vary depending on the category of projects classified as reward-based crowdfunding. Because consumer payment decisions and motivation of consumer purchasing behavior are classified according to the purpose of purchase, the previous papers that the goal gradient effect that the main motivation of consumer donation for reward-based crowdfunding introduced vary depending on project category of utilitarian and hedonic. In this study, consumer's daily donation data is collected by Indiegogo which is a leading reward-based crowdfunding company using web-crawling and the model was defined as propensity score matching (PSM) and random effect model. The results showed that the goal gradient effect occurred in utilitarian project category, but no goal gradient effect for the hedonic project category. Furthermore, this paper developed the study of motivation of consumer donation and contributes theoretical foundation by the results consumer donation may vary depending on the project category; also, this paper has implications for an effective marketing strategy depending on the project category leaves real meaning to the projector.

Coin Classification using CNN (CNN 을 이용한 동전 분류)

  • Lee, Jaehyun;Shin, Donggyu;Park, Leejun;Song, Hyunjoo;Gu, Bongen
    • Journal of Platform Technology
    • /
    • v.9 no.3
    • /
    • pp.63-69
    • /
    • 2021
  • Limited materials to make coins for countries and designs suitable for hand-carry make the shape, size, and color of coins similar. This similarity makes that it is difficult for visitors to identify each country's coins. To solve this problem, we propose the coin classification method using CNN effective to image processing. In our coin identification method, we collect the training data by using web crawling and use OpenCV for preprocessing. After preprocessing, we extract features from an image by using three CNN layers and classify coins by using two fully connected network layers. To show that our model designed in this paper is effective for coin classification, we evaluate our model using eight different coin types. From our experimental results, the accuracy for coin classification is about 99.5%.

Does Rain Really Cause Toothache? Statistical Analysis Based on Google Trends

  • Jeon, Se-Jeong
    • Journal of dental hygiene science
    • /
    • v.21 no.2
    • /
    • pp.104-110
    • /
    • 2021
  • Background: Regardless of countries, the myth that rain makes the body ache has been worded in various forms, and a number of studies have been reported to investigate this. However, these studies, which depended on the patient's experience or memory, had obvious limitations. Google Trends is a big data analysis service based on search terms and viewing videos provided by Google LLC, and attempts to use it in various fields are continuing. In this study, we endeavored to introduce the 'value as a research tool' of the Google Trends, that has emerged along with technological advancements, through research on 'whether toothaches really occur frequently on rainy days'. Methods: Keywords were selected as objectively as possible by applying web crawling and text mining techniques, and the keyword "bi" meaning rain in Korean was added to verify the reliability of Google Trends data. The correlation was statistically analyzed using precipitation and temperature data provided by the Korea Meteorological Agency and daily search volume data provided by Google Trends. Results: Keywords "chi-gwa", "chi-tong", and "chung-chi" were selected, which in Korean mean 'dental clinic', 'toothache', and 'tooth decay' respectively. A significant correlation was found between the amount of precipitation and the search volume of tooth decay. No correlation was found between precipitation and other keywords or other combinations. It was natural that a very significant correlation was found between the amount of precipitation, temperature, and the search volume of "bi". Conclusion: Rain seems to actually be a cause of toothache, and if objective keyword selection is premised, Google Trends is considered to be very useful as a research tool in the future.

COVID-19 and Korean Family Life on Social Media: A Topic Model Approach (소셜 빅데이터로 알아본 코로나19와 가족생활: 토픽모델 접근)

  • Park, Sunyoung;Lee, Jaerim
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.3
    • /
    • pp.282-300
    • /
    • 2021
  • The purpose of this study was to explore what social media posts tell us about family life during the COVID-19 pandemic by examining the keywords and topics underlying posts on blogs and online forums. Our criteria for web crawling were (a) blog and forum posts on Naver and Daum, the top portal sites in Korea, (b) posts between February 23 and April 19, 2020, the period of the first heightened social distancing orders, and (c) inclusion of "COVID" and "family" or "COVID" and "home." We analyzed 351,734 posts using TF-IDF values and topic modeling based on latent Dirichlet allocation. We identified and named 22 topics including COVID-19 prevention, family infection, family health, dietary life and changes, religious life, stuck at home, postponed school year, family events, travel and vacations, concerns about family and friends, anxiety and stress, disaster and damage, COVID-19 warning text messages, family support policies, Shin-cheon-ji and Daegu. The results show that COVID-19 impacted various domains of family life including health, food, housing, religion, child care, education, rituals, and leisure as well as relationships and emotions.