• Title/Summary/Keyword: 웹 크롤링

Search Result 115, Processing Time 0.023 seconds

The Role of stock market management and social media - Analyzing the types of individual investor and topic - (주식시장관리제도와 소셜 미디어의 역할 - 개인 투자자 집단 유형과 토픽 분석 -)

  • Kim, Jung-Su;Lee, Suk-Jun
    • Management & Information Systems Review
    • /
    • v.34 no.5
    • /
    • pp.23-47
    • /
    • 2015
  • In the Korea stock market, individual investors have perceived stock as short arbitrage investment, not long-term investment strategy. In order to reinforce stock market transparency and soundness, it is important to enforce the measures for stock market management. Especially, stock market event caused by financial policy can be given individual investors negative information regarding a stock trading. Thus, it is a need for investigating whether comprehensive review of listing eligibility is influenced on individual investors' responses and stock behaviors in respect of effectiveness. The purpose of this study to examine the relations between such stock market management and transitional aspect of individual investors' trading types and response on the based of pre- and post-event occurrence. Using an dataset of user's text messages on 9 firms posted on the firm-based social media (i.e., Naver, Daum, Paxnet) over the period 2009 to 2014. And we performed text-clustering and topic modeling according to keywords for classifying into investors group and non-investors groups and two types of investors were categorized depending on main topic transition by event windows in Comprehensive review of listing eligibility. The results indicated that a variety of stockholders existed in the stock. And the ratio of non-investors group was on the decrease, on the other hand, the proportion of investors group veer onto the side of pre-pattern after comprehensive review of listing eligibility. A distinctive feature of our study is to explain the influence of stock market management on response changes of individual investors as well as to categorize in accordance with time progression. Implications an suggestions for future research were also discussed.

  • PDF

Sensitivity Identification Method for New Words of Social Media based on Naive Bayes Classification (나이브 베이즈 기반 소셜 미디어 상의 신조어 감성 판별 기법)

  • Kim, Jeong In;Park, Sang Jin;Kim, Hyoung Ju;Choi, Jun Ho;Kim, Han Il;Kim, Pan Koo
    • Smart Media Journal
    • /
    • v.9 no.1
    • /
    • pp.51-59
    • /
    • 2020
  • From PC communication to the development of the internet, a new term has been coined on the social media, and the social media culture has been formed due to the spread of smart phones, and the newly coined word is becoming a culture. With the advent of social networking sites and smart phones serving as a bridge, the number of data has increased in real time. The use of new words can have many advantages, including the use of short sentences to solve the problems of various letter-limited messengers and reduce data. However, new words do not have a dictionary meaning and there are limitations and degradation of algorithms such as data mining. Therefore, in this paper, the opinion of the document is confirmed by collecting data through web crawling and extracting new words contained within the text data and establishing an emotional classification. The progress of the experiment is divided into three categories. First, a word collected by collecting a new word on the social media is subjected to learned of affirmative and negative. Next, to derive and verify emotional values using standard documents, TF-IDF is used to score noun sensibilities to enter the emotional values of the data. As with the new words, the classified emotional values are applied to verify that the emotions are classified in standard language documents. Finally, a combination of the newly coined words and standard emotional values is used to perform a comparative analysis of the technology of the instrument.

Development of a method for urban flooding detection using unstructured data and deep learing (비정형 데이터와 딥러닝을 활용한 내수침수 탐지기술 개발)

  • Lee, Haneul;Kim, Hung Soo;Kim, Soojun;Kim, Donghyun;Kim, Jongsung
    • Journal of Korea Water Resources Association
    • /
    • v.54 no.12
    • /
    • pp.1233-1242
    • /
    • 2021
  • In this study, a model was developed to determine whether flooding occurred using image data, which is unstructured data. CNN-based VGG16 and VGG19 were used to develop the flood classification model. In order to develop a model, images of flooded and non-flooded images were collected using web crawling method. Since the data collected using the web crawling method contains noise data, data irrelevant to this study was primarily deleted, and secondly, the image size was changed to 224×224 for model application. In addition, image augmentation was performed by changing the angle of the image for diversity of image. Finally, learning was performed using 2,500 images of flooding and 2,500 images of non-flooding. As a result of model evaluation, the average classification performance of the model was found to be 97%. In the future, if the model developed through the results of this study is mounted on the CCTV control center system, it is judged that the respons against flood damage can be done quickly.

Occupational Therapy in Long-Term Care Insurance For the Elderly Using Text Mining (텍스트 마이닝을 활용한 노인장기요양보험에서의 작업치료: 2007-2018년)

  • Cho, Min Seok;Baek, Soon Hyung;Park, Eom-Ji;Park, Soo Hee
    • Journal of Society of Occupational Therapy for the Aged and Dementia
    • /
    • v.12 no.2
    • /
    • pp.67-74
    • /
    • 2018
  • Objective : The purpose of this study is to quantitatively analyze the role of occupational therapy in long - term care insurance for the elderly using text mining, one of the big data analysis techniques. Method : For the analysis of newspaper articles, "Long - Term Care Insurance for the Elderly + Occupational Therapy for the Elderly" was collected after the period from 2007 to 208. Naver, which has a high share of the domestic search engine, utilized the database of Naver News by utilizing Textom, a web crawling tool. After collecting the article title and original text of 510 news data from the collection of the elderly long term care insurance + occupational therapy search, we analyzed the article frequency and key words by year. Result : In terms of the frequency of articles published by year, the number of articles published in 2015 and 2017 was the highest with 70 articles (13.7%), and the top 10 terms of the key word analysis showed the highest frequency of 'dementia' (344) In terms of key words, dementia, treatment, hospital, health, service, rehabilitation, facilities, institution, grade, elderly, professional, salary, industrial complex and people are related. Conclusion : In this study, it is meaningful that the textual mining technique was used to more objectively confirm the social needs and the role of the occupational therapist for the dementia and rehabilitation in the related key keywords based on the media reporting trend of the elderly long - term care insurance for 11 years. Based on the results of this study, future research should expand research field and period and supplement the research methodology through various analysis methods according to the year.

Prototype Design and Development of Online Recruitment System Based on Social Media and Video Interview Analysis (소셜미디어 및 면접 영상 분석 기반 온라인 채용지원시스템 프로토타입 설계 및 구현)

  • Cho, Jinhyung;Kang, Hwansoo;Yoo, Woochang;Park, Kyutae
    • Journal of Digital Convergence
    • /
    • v.19 no.3
    • /
    • pp.203-209
    • /
    • 2021
  • In this study, a prototype design model was proposed for developing an online recruitment system through multi-dimensional data crawling and social media analysis, and validates text information and video interview in job application process. This study includes a comparative analysis process through text mining to verify the authenticity of job application paperwork and to effectively hire and allocate workers based on the potential job capability. Based on the prototype system, we conducted performance tests and analyzed the result for key performance indicators such as text mining accuracy and interview STT(speech to text) function recognition rate. If commercialized based on design specifications and prototype development results derived from this study, it may be expected to be utilized as the intelligent online recruitment system technology required in the public and private recruitment markets in the future.

Determination of Fire Risk Assessment Indicators for Building using Big Data (빅데이터를 활용한 건축물 화재위험도 평가 지표 결정)

  • Joo, Hong-Jun;Choi, Yun-Jeong;Ok, Chi-Yeol;An, Jae-Hong
    • Journal of the Korea Institute of Building Construction
    • /
    • v.22 no.3
    • /
    • pp.281-291
    • /
    • 2022
  • This study attempts to use big data to determine the indicators necessary for a fire risk assessment of buildings. Because most of the causes affecting the fire risk of buildings are fixed as indicators considering only the building itself, previously only limited and subjective assessment has been performed. Therefore, if various internal and external indicators can be considered using big data, effective measures can be taken to reduce the fire risk of buildings. To collect the data necessary to determine indicators, a query language was first selected, and professional literature was collected in the form of unstructured data using a web crawling technique. To collect the words in the literature, pre-processing was performed such as user dictionary registration, duplicate literature, and stopwords. Then, through a review of previous research, words were classified into four components, and representative keywords related to risk were selected from each component. Risk-related indicators were collected through analysis of related words of representative keywords. By examining the indicators according to their selection criteria, 20 indicators could be determined. This research methodology indicates the applicability of big data analysis for establishing measures to reduce fire risk in buildings, and the determined risk indicators can be used as reference materials for assessment.

A Study on the Factors of Well-aging through Big Data Analysis : Focusing on Newspaper Articles (빅데이터 분석을 활용한 웰에이징 요인에 관한 연구 : 신문기사를 중심으로)

  • Lee, Chong Hyung;Kang, Kyung Hee;Kim, Yong Ha;Lim, Hyo Nam;Ku, Jin Hee;Kim, Kwang Hwan
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.22 no.5
    • /
    • pp.354-360
    • /
    • 2021
  • People hope to live a healthy and happy life achieving satisfaction by striking a good work-life balance. Therefore, there is a growing interest in well-aging which means living happily to a healthy old age without worry. This study identified important factors related to well-aging by analyzing news articles published in Korea. Using Python-based web crawling, 1,199 articles were collected on the news service of portal site Daum till November 2020, and 374 articles were selected which matched the subject of the study. The frequency analysis results of text mining showed keywords such as 'elderly', 'health', 'skin', 'well-aging', 'product', 'person', 'aging', 'female', 'domestic' and 'retirement' as important keywords. Besides, a social network analysis with 45 important keywords revealed strong connections in the order of 'skin-wrinkle', 'skin-aging' and 'old-health'. The result of the CONCOR analysis showed that 45 main keywords were composed of eight clusters of 'life and happiness', 'disease and death', 'nutrition and exercise', 'healing', 'health', and 'elderly services'.

Analysis of articles on water quality accidents in the water distribution networks using big data topic modelling and sentiment analysis (빅데이터 토픽모델링과 감성분석을 활용한 물공급과정에서의 수질사고 기사 분석)

  • Hong, Sung-Jin;Yoo, Do-Guen
    • Journal of Korea Water Resources Association
    • /
    • v.55 no.spc1
    • /
    • pp.1235-1249
    • /
    • 2022
  • This study applied the web crawling technique for extracting big data news on water quality accidents in the water supply system and presented the algorithm in a procedural way to obtain accurate water quality accident news. In addition, in the case of a large-scale water quality accident, development patterns such as accident recognition, accident spread, accident response, and accident resolution appear according to the occurrence of an accident. That is, the analysis of the development of water quality accidents through key keywords and sentiment analysis for each stage was carried out in detail based on case studies, and the meanings were analyzed and derived. The proposed methodology was applied to the larval accident period of Incheon Metropolitan City in 2020 and analyzed. As a result, in a situation where the disclosure of information that directly affects consumers, such as water quality accidents, is restricted, the tone of news articles and media reports about water quality accidents with long-term damage in the event of an accident and the degree of consumer pride clearly change over time. could check This suggests the need to prepare consumer-centered policies to increase consumer positivity, although rapid restoration of facilities is very important for the development of water quality accidents from the supplier's point of view.

Detecting Weak Signals for Carbon Neutrality Technology using Text Mining of Web News (탄소중립 기술의 미래신호 탐색연구: 국내 뉴스 기사 텍스트데이터를 중심으로)

  • Jisong Jeong;Seungkook Roh
    • Journal of Industrial Convergence
    • /
    • v.21 no.5
    • /
    • pp.1-13
    • /
    • 2023
  • Carbon neutrality is the concept of reducing greenhouse gases emitted by human activities and making actual emissions zero through removal of remaining gases. It is also called "Net-Zero" and "carbon zero". Korea has declared a "2050 Carbon Neutrality policy" to cope with the climate change crisis. Various carbon reduction legislative processes are underway. Since carbon neutrality requires changes in industrial technology, it is important to prepare a system for carbon zero. This paper aims to understand the status and trends of global carbon neutrality technology. Therefore, ROK's web platform "www.naver.com." was selected as the data collection scope. Korean online articles related to carbon neutrality were collected. Carbon neutrality technology trends were analyzed by future signal methodology and Word2Vec algorithm which is a neural network deep learning technology. As a result, technology advancement in the steel and petrochemical sectors, which are carbon over-release industries, was required. Investment feasibility in the electric vehicle sector and technology advancement were on the rise. It seems that the government's support for carbon neutrality and the creation of global technology infrastructure should be supported. In addition, it is urgent to cultivate human resources, and possible to confirm the need to prepare support policies for carbon neutrality.

Suitable clothing recommendation system by size and skin color (의류 사이즈별 및 피부톤에 기반을 둔 의류 추천 시스템)

  • Park, Chang-Young;Lim, Byeong-Chan;Lee, Won-Joon;Lee, Chang-Su;Kim, Min-Su;Lee, Sang-Yong
    • Journal of Digital Convergence
    • /
    • v.20 no.3
    • /
    • pp.407-413
    • /
    • 2022
  • Existing clothing recommendation systems remain at the level of showing appropriate photos when a user selects a type of clothing he or she likes after entering his or her own body size or body size. When a user purchases clothing using such recommendation systems, there are many cases in which it does not fit or does not fit the user's body size. In this study, to solve these problems of existing clothing recommendation systems, a system was implemented in which the user receives not only size but also skin tone and recommends clothing suitable for the user's body size as well as skin tone. In this system, clothing size information obtained through web crawling was periodically stored in a database for eight male tops to recommend clothing, and the entire pixel of the clothing image was analyzed to extract color text values. In order to confirm the performance of this system, a survey was conducted on 100 male college students, and the satisfaction level was 70%. Most of the reasons for not being satisfied are that the recommended clothing is limited, so it is judged that it is necessary to expand the target clothing in the future.