• Title/Summary/Keyword: 웹크롤링

Search Result 30, Processing Time 0.021 seconds

A Study on Sentiment Analysis of Media and SNS response to National Policy: focusing on policy of Child allowance, Childbirth grant (국가 정책에 대한 언론과 SNS 반응의 감성 분석 연구 -아동 수당, 출산 장려금 정책을 중심으로-)

  • Yun, Hye Min;Choi, Eun Jung
    • Journal of Digital Convergence
    • /
    • v.17 no.2
    • /
    • pp.195-200
    • /
    • 2019
  • Nowadays as the use of mobile communication devices such as smart phones and tablets and the use of Computer is expanded, data is being collected exponentially on the Internet. In addition, due to the development of SNS, users can freely communicate with each other and share information in various fields, so various opinions are accumulated in the from of big data. Accordingly, big data analysis techniques are being used to find out the difference between the response of the general public and the response of the media. In this paper, we analyzed the public response in SNS about child allowance and childbirth grant and analyzed the response of the media. Therefore we gathered articles and comments of users which were posted on Twitter for a certain period of time and crawling the news articles and applied sentiment analysis. From these data, we compared the opinion of the public posted on SNS with the response of the media expressed in news articles. As a result, we found that there is a different response to some national policy between the public and the media.

WCTT: Web Crawling System based on HTML Document Formalization (WCTT: HTML 문서 정형화 기반 웹 크롤링 시스템)

  • Kim, Jin-Hwan;Kim, Eun-Gyung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.4
    • /
    • pp.495-502
    • /
    • 2022
  • Web crawler, which is mainly used to collect text on the web today, is difficult to maintain and expand because researchers must implement different collection logic by collection channel after analyzing tags and styles of HTML documents. To solve this problem, the web crawler should be able to collect text by formalizing HTML documents to the same structure. In this paper, we designed and implemented WCTT(Web Crawling system based on Tag path and Text appearance frequency), a web crawling system that collects text with a single collection logic by formalizing HTML documents based on tag path and text appearance frequency. Because WCTT collects texts with the same logic for all collection channels, it is easy to maintain and expand the collection channel. In addition, it provides the preprocessing function that removes stopwords and extracts only nouns for keyword network analysis and so on.

Determinants of Shortening Job-hunting Period in Platform Labor Market: Analysis by using Web Crawling and Survival Model (플랫폼 노동시장의 구직기간 단축 결정요인: 웹크롤링과 생존모형을 이용한 분석)

  • Lee, Jongho
    • Journal of Digital Convergence
    • /
    • v.19 no.5
    • /
    • pp.1-13
    • /
    • 2021
  • The purpose of this research is to analyze how the wage level of new job seekers in the platform labor market affects the period on getting the first job. Recently, the platform gets attention as one of alternatives to solve the increase of unemployment rate. It is important to create quality jobs that we build up a trust between employers and employees in the platform. Previous studies showed that feedback from previous employers is important for solving the information asymmetry problem between those people. However, there is no feedback for new job seekers who have not get the first job. Therefore, we focus on the fact that wages are presented by job seekers rather than employers in the platform, and we will figure out that the low wages of new job seekers may affect the shortening of job-hunting period. For this reason, we use 3,704 job seekers of Freelancer.com. Survival analysis shows that low wages for new job seekers have a significant impact on shortening job-hunting period.

Data value extraction through comparison of online big data analysis results and water supply statistics (온라인 빅 데이터 분석 결과와 상수도 통계 비교를 통한 데이터 가치 추출)

  • Hong, Sungjin;Yoo, Do Guen
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2021.06a
    • /
    • pp.431-431
    • /
    • 2021
  • 4차 산업혁명의 도래로 사회기반시설물의 계획 및 운영관리에 있어 데이터 분석을 통한 가치추출에 대한 관심은 매우 높은 상황이다. 데이터의 가용성과 접근성, 정부 지원 등을 평가하는 공공데이터 개방지수에서 한국은 1점 만점에 0.93점을 획득하여 경제협력개발기구 회원국 중 1위(2019년 기준)를 할 정도로 매우 높은 수준(평균 0.60점)이다. 그러나 공식적으로 발표 및 배포되는 사회기반시설물 관련 정보와 심도 있는 연구 분석이 필요한 정보는 접근이 여전히 제한적이라 할 수 있다. 특히 대표적인 사회기반시설물인 상수도시스템은 대부분 국가중요시설로 지정되어 있어 다양한 정보를 획득하고 분석하는데 제약이 존재하며, 관련 국가통계인 상수도통계에서는 누수사고 등과 같은 비정상적 상황에 대한 사고지점, 원인 등과 같은 세부정보는 제공하고 있지 않다. 본 연구에서는 웹크롤링 및 빅데이터 분석기술을 활용하여 과거 일정기간 발생한 지자체의 상수도 누수사고 관련 뉴스를 전수조사하고 도출된 사고건수를 국가 공인 정보인 상수도통계자료와 비교·분석하였다. 독립적인 누수사고 기사를 추출하기 위해서 중복기사의 제거, 누수 관련 키워드 정립, 상수도분야 이외의 관련기사 제거 등의 절차가 필요하며, 이와 같은 기법은 R프로그래밍을 통해 구현되었다. 추가적으로 뉴스기사의 자연어 처리기반 정보추출기법을 통해 누수사고 건수 뿐만 아니라 사고발생일, 위치, 원인, 피해정도, 그리고 대상 관로의 크기 등을 획득하여 상수도 통계에서 제시하고 있는 정보보다 많은 가치를 추출하여 연계할 수 있는 방안을 제시하였다. 제시된 방법론을 국내 A광역시에 적용하여 누수사고 건수를 비교한 결과 상수도통계에서 제시하고 있는 누수발생건수와 유사한 규모의 사고건수를 뉴스기사분석을 통해 도출할 수 있었다. 제안된 방법론은 추가적인 정보의 추출이 가능하다는 점에서 향후 활용성이 높을 것으로 기대된다.

  • PDF

Strategy for Sustainable Growth Through Forming Network in Mobile Service Industry: Focusing on Stock-Swapping M&A Strategy of YelloMobile (모바일 서비스 산업에서의 네트워크 형성을 통한 성장 전략: 옐로모바일의 지분교환방식 인수합병을 중심으로)

  • Lee, Saerom;Jahng, Jungjoo
    • Asia-Pacific Journal of Business Venturing and Entrepreneurship
    • /
    • v.11 no.1
    • /
    • pp.109-119
    • /
    • 2016
  • Due to the fact that it is relatively easy to transfer technology between application developers or content providers, low entry barrier in the business causes fierce competition among the venture companies in mobile service industry. Our study examines a sustainable strategies to operate a business for venture companies that are in a highly competitive technology-intensive industry. In this paper, we examine how venture firms created a network and brought synergy effects, using network theory. Korean venture firm, YelloMobile, uses unique strategies of merger and acquisition through the method of swapping equity and thereby establishing network. We contribute to expand network theory by examining three elements of network: such as network structure, network governance mechanisms, and network contents.

  • PDF

Concept Classification System of Jeju Oreum based on Web Search (웹 검색 기반으로 한 제주 오름의 콘셉트 분류 시스템)

  • Ahn, Jinhyun;Byun, So-Young;Woo, Seo-Jung;An, Ye-Ji;Kang, Jungwoon;Kim, Mincheol
    • Journal of Digital Convergence
    • /
    • v.19 no.8
    • /
    • pp.235-240
    • /
    • 2021
  • Currently, the number of visitors to Oreum is increasing and the trend of tourism is changing rapidly. The motivation for visiting Oreum is also changing from relaxation and pleasure to experiences. In line with this change, people visit the mountain by selecting motivation such as marriage and family photos, not just exercise. However, it is difficult to search for an Oreum that matches the tourists' motivation. In order to solve these problems, we proposed a system that provides the association between Oreum and concept based on the number of search results from web search engines in real time. User can select the desired date to check the associations for past or selected periods and concepts. Through this research, visitors to Oreum, Jeju's natural heritage, can contribute to the development of tourism in Jeju. In the future, the concept of visiting beaches or seas, not just Jeju Oreum, can be provided. In this work, search results from websites are collected, stored in a database, and search results of Oreum and concept are provided on the homepage to classify Oreum trends.

Media exposure analysis of official sponsors and general companies of mega sport event (메가 스포츠이벤트의 공식스폰서와 일반기업의 미디어 노출 분석)

  • Kim, Joo-Hak;Cho, Sun-Mi
    • Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology
    • /
    • v.8 no.4
    • /
    • pp.171-181
    • /
    • 2018
  • As the proportion of sports events in the sports industry grows, the official sponsor market for sports events is also increasing. But because official sponsors are limited and expensive, some companies approach sporting events by way of Ambush marketing. This study is to analyze the differences of media exposure between official sponsors and general companies of mega sport events. To accomplish the purpose of the study, we collected text articles and analyzed them from the period of 2016 Rio Olympics, one year before the Olympics and one year after the Olympics. Web crawling was performed using Python for the collection of articles. Morphological and frequency analysis was performed using the KoNLP package and the TM package of statistical program R. In addition, the opinions of the related experts group were gathered to classify the companies or organizations in the media as the Organizing Committees for the Olympic Games(OCOGs), official sponsor, and general companies. As a result of the analysis, 5,220 times appeared related to the OCOGs, 7,845 times appeared related to the official sponsor, and 7,028 times appeared related to general companies. There isn't much difference in the frequency of exposure between official sponsors and general companies. It implies that Ambush marketing is recognized as a strategic marketing technique. The International Olympic Committee(IOC) has to recognize these social phenomena and establish reasonable standards for the marketing activities of official sponsors and general companies. And this study will serve as a basis for fair sponsor activities or marketing activities of sports events.

Examining the Urban Growth Process of the 1st New Town -Focusing on the Keyword Network Analysis of Newspaper Articles using Text Mining- (1기 신도시의 도시 성장 과정 고찰 - 텍스트마이닝을 이용한 신문기사의 키워드 네트워크 분석을 중심으로 -)

  • Jung, Da-Eun;Kim, Chung Ho
    • Journal of the Korean Regional Science Association
    • /
    • v.39 no.4
    • /
    • pp.91-110
    • /
    • 2023
  • The purpose of this study is to explore urban issues that have arisen in the urban growth process of the 1st New Town for about 34 years since its construction through newspaper articles. For this purpose, newspaper articles related to the 1st New Town were collected using web crawling, and content analysis was conducted based on text mining. The main findings of the study are as follows. First, in the early stages of the construction of the 1st New Town, issues were diverse in the following six sectors: living service facilities, real estate, transportation, urban development and maintenance, safety, and housing supply, but gradually narrowed down to those of real estate and urban development and maintenance. Second, during the new town construction and urban stabilization stages, the network structure centered on 'Seoul' was maintained, which can be explained by the fact that the 1st New Town was geographically located on the outskirts of Seoul, and many articles compared the issues to Seoul. Third, the issue of urban aging appeared from the 10th year after construction, and the discussion on urban reorganization due to urban aging began in earnest from the 30th year after construction. The significance of the study is that it explored the urban issues that occurred throughout the urban growth process of the 1st New Town, and can be used as a basis for preparing a plan to reorganize the 1st New Town.

A Topic Analysis of Abstracts in Journal of Korean Data Analysis Society (한국자료분석학회지에 대한 토픽분석)

  • Kang, Changwan;Kim, Kyu Kon;Choi, Seungbae
    • Journal of the Korean Data Analysis Society
    • /
    • v.20 no.6
    • /
    • pp.2907-2915
    • /
    • 2018
  • Journal of the Korean Data Analysis Society founded in 1998 has played the role of a major application journal. In this study, we checked the objective of this journal by checking the abstracts for 10 years. Abstract data was crawled from the online journal site (kdas.jems.or.kr) and analyzed by topic model. As a result, we found 18 topics from 2680 abstracts that had several contents, for example, nursing, marketing, economics, regression, factor analysis, data mining and statistical inferences. Topic1 (regression) is most frequent with 460 documents and we found the usefulness of regression in the applied science area. We confirmed the significant 10 association rules using by Fisher's exact test. Also, for exploring the trend of topics, we conducted the topic analysis for two periods which are 2006-2011 period and 2012-2016 period. We found that the control study was more frequent than survey study over time and regression and factor analysis were frequent regardless of time.

Basic Research on the Possibility of Developing a Landscape Perceptual Response Prediction Model Using Artificial Intelligence - Focusing on Machine Learning Techniques - (인공지능을 활용한 경관 지각반응 예측모델 개발 가능성 기초연구 - 머신러닝 기법을 중심으로 -)

  • Kim, Jin-Pyo;Suh, Joo-Hwan
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.51 no.3
    • /
    • pp.70-82
    • /
    • 2023
  • The recent surge of IT and data acquisition is shifting the paradigm in all aspects of life, and these advances are also affecting academic fields. Research topics and methods are being improved through academic exchange and connections. In particular, data-based research methods are employed in various academic fields, including landscape architecture, where continuous research is needed. Therefore, this study aims to investigate the possibility of developing a landscape preference evaluation and prediction model using machine learning, a branch of Artificial Intelligence, reflecting the current situation. To achieve the goal of this study, machine learning techniques were applied to the landscaping field to build a landscape preference evaluation and prediction model to verify the simulation accuracy of the model. For this, wind power facility landscape images, recently attracting attention as a renewable energy source, were selected as the research objects. For analysis, images of the wind power facility landscapes were collected using web crawling techniques, and an analysis dataset was built. Orange version 3.33, a program from the University of Ljubljana was used for machine learning analysis to derive a prediction model with excellent performance. IA model that integrates the evaluation criteria of machine learning and a separate model structure for the evaluation criteria were used to generate a model using kNN, SVM, Random Forest, Logistic Regression, and Neural Network algorithms suitable for machine learning classification models. The performance evaluation of the generated models was conducted to derive the most suitable prediction model. The prediction model derived in this study separately evaluates three evaluation criteria, including classification by type of landscape, classification by distance between landscape and target, and classification by preference, and then synthesizes and predicts results. As a result of the study, a prediction model with a high accuracy of 0.986 for the evaluation criterion according to the type of landscape, 0.973 for the evaluation criterion according to the distance, and 0.952 for the evaluation criterion according to the preference was developed, and it can be seen that the verification process through the evaluation of data prediction results exceeds the required performance value of the model. As an experimental attempt to investigate the possibility of developing a prediction model using machine learning in landscape-related research, this study was able to confirm the possibility of creating a high-performance prediction model by building a data set through the collection and refinement of image data and subsequently utilizing it in landscape-related research fields. Based on the results, implications, and limitations of this study, it is believed that it is possible to develop various types of landscape prediction models, including wind power facility natural, and cultural landscapes. Machine learning techniques can be more useful and valuable in the field of landscape architecture by exploring and applying research methods appropriate to the topic, reducing the time of data classification through the study of a model that classifies images according to landscape types or analyzing the importance of landscape planning factors through the analysis of landscape prediction factors using machine learning.