• Title/Summary/Keyword: Crawling

Search Result 369, Processing Time 0.026 seconds

Convergence of Korean Traditional Dance and K-Pop Dance : An Analysis of Comments on 2018 MMA BTS 'IDOL' Videos on YouTube (한국 전통춤과 K-pop 댄스의 융합 : 2018 MMA 방탄소년단 'IDOL' 유튜브 댓글 분석)

  • Yoo, Ji-Young;Kim, Mi-Kyung
    • Journal of Korea Entertainment Industry Association
    • /
    • v.13 no.8
    • /
    • pp.189-198
    • /
    • 2019
  • This study aims to make meaning of the reactions of the Korean people through the text mining of comments on videos of the December 2018 MMA performance of intro on YouTube. For this, comments on 15 YouTube videos were collected over the past 10 months. With the collected data, a total of 5,135 comments were analyzed through crawling using the Python and BeautifulSoup programs, data was refined over a total of 3 sessions, and a final total of 5,080 comments were used as analysis material. A mining technique was used for data analysis and the process of refinement, analysis, and visualization was achieved using the Textom program. Research results showed that keyword analysis showed the keywords of 'performance', 'Korea', 'video', 'top', 'cool', 'dance', 'idol', 'legend', 'love', and 'gratitude' in that order and keywords such as 'patriotism' and 'Olympics' also appeared frequently. N-gram analysis showed that comments with contexts such as 'a top performance that will remain a legend among Korean idol performances', and 'an idol performance that displayed the traditional culture of Korea' were in higher ranks. Based on such keyword analysis results, topic modeling was applied and 5 top keywords were extracted from a total of 5 topics. Analysis results of topic contents and distribution showed that topics in the comments of this performance's videos largely consisted of the 3 reactions of 'high praise regarding the stage performance', 'affection towards the fusion and artistic sublimation of Korean traditional dance', and 'gratitude towards the uploading of cool dance videos'

An Analysis of the Support Policy for Small Businesses in the Post-Covid-19 Era Using the LDA Topic Model (LDA 토픽 모델을 활용한 포스트 Covid-19 시대의 소상공인 지원정책 분석)

  • Kyung-Do Suh;Jung-il Choi;Pan-Am Choi;Jaerim Jung
    • Journal of Industrial Convergence
    • /
    • v.22 no.6
    • /
    • pp.51-59
    • /
    • 2024
  • The purpose of the paper is to suggest government policies that are practically helpful to small business owners in pandemic situations such as COVID-19. To this end, keyword frequency analysis and word cloud analysis of text mining analysis were performed by crawling news articles centered on the keywords "COVID-19 Support for Small Businesses", "The Impact of Small Businesses by Response System to COVID-19 Infectious Diseases", and "COVID-19 Small Business Economic Policy", and major issues were identified through LDA topic modeling analysis. As a result of conducting LDA topic modeling, the support policy for small business owners formed a topic label with government cash and financial support, and the impact of small business owners according to the COVID-19 infectious disease response system formed a topic label with a government-led quarantine system and an individual-led quarantine system, and the COVID-19 economic policy formed a topic label with a policy for small business owners to acquire economic crisis and self-sustainability. Focusing on the organized topic label, it was intended to provide basic data for small business owners to understand the damage reduction policy for small business owners and the policy for enhancing market competitiveness in the future pandemic situation.

Sentiment Analyses of the Impacts of Online Experience Subjectivity on Customer Satisfaction (감성분석을 이용한 온라인 체험 내 비정형데이터의 주관도가 고객만족에 미치는 영향 분석)

  • Yeeun Seo;Sang-Yong Tom Lee
    • Information Systems Review
    • /
    • v.25 no.1
    • /
    • pp.233-255
    • /
    • 2023
  • The development of information technology(IT) has brought so-called "online experience" to satisfy our daily needs. The market for online experiences grew more during the COVID-19 pandemic. Therefore, this study attempted to analyze how the features of online experience services affect customer satisfaction by crawling structured and unstructured data from the online experience web site newly launched by Airbnb after COVID-19. As a result of the analysis, it was found that the structured data generated by service users on a C2C online sharing platform had a positive effect on the satisfaction of other users. In addition, unstructured text data such as experience introductions and host introductions generated by service providers turned out to have different subjectivity scores depending on the purpose of its text. It was confirmed that the subjective host introduction and the objective experience introduction affect customer satisfaction positively. The results of this study are to provide various implications to stakeholders of the online sharing economy platform and researchers interested in online experience knowledge management.

A Tracking Method of Same Drug Sales Accounts through Similarity Analysis of Instagram Profiles and Posts

  • Eun-Young Park;Jiyeon Kim;Chang-Hoon Kim
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.2
    • /
    • pp.109-118
    • /
    • 2024
  • With the increasing number of social media users worldwide, cases of social media being abused to perpetrate various crimes are increasing. Specifically, drug distribution through social media is emerging as a serious social problem. Using social media channels, the curiosity of teenagers regarding drugs is stimulated through clever marketing. Further, social media easily facilitates drug purchases due to the high accessibility of drug sellers and consumers. Among various social media platforms, we focused on Instagram, which is the most used social media platform by young adults aged 19 to 24 years in South Korea. We collected four types of information, including profile photos, introductions, posts in the form of images, and posts in the form of texts on Instagram; then, we analyzed the similarity among each type of collected information. The profile photos and posts in the form of image were analyzed for similarity based on the SSIM(Structural Simplicity Index Measure), while introductions and posts in the form of text were analyzed for similarity using Jaccard and Cosine similarity techniques. Through the similarity analysis, the similarity among various accounts for each collected information type was measured, and accounts with similarity above the significance level were determined as the same drug sales account. By performing logistic regression analysis on the aforementioned information types, we confirmed that except posts in image form, profile photos, introductions, and posts in the text form were valid information for tracking the same drug sales account.

Analyzing Changes in Consumers' Interest Areas Related to Skin under the Pandemic: Focusing on Structural Topic Modeling (팬데믹에 따른 소비자의 피부 관련 관심 영역 변화 분석: 구조적 토픽모델링을 중심으로)

  • Nakyung Kim;Jiwon Park;HyungBin Moon
    • Knowledge Management Research
    • /
    • v.25 no.1
    • /
    • pp.173-192
    • /
    • 2024
  • This study aims to understand the changes in the beauty industry due to the pandemic from the consumer's perspective based on consumers' opinions about their skin online before and after the pandemic. Furthermore, this study tries to derive strategies for companies and governments to support sustainable growth and innovation in the beauty industry. To this end, posts on social media from 2017 to 2022 that contained the keyword 'skin concerns' are collected, and after data preprocessing, 96,908 posts are used for the structural topic model. To examine whether consumers' interest areas related to skin change according to the pandemic situation, the analysis period is divided into 7 periods, and the variables that distinguish each stage are used as meta-variables for the structural topic model. As a result, it is found that consumers' interests can be divided into 22 topics, which can be categorized into four main categories: beauty manufacturing, beauty services, skin concerns, and other. The results of this study are expected to be utilized in construction of product development and marketing strategies of related companies and the establishment of economic support policies by the government in response to changes in demand in the beauty industry due to the pandemic.

Examining the Urban Growth Process of the 1st New Town -Focusing on the Keyword Network Analysis of Newspaper Articles using Text Mining- (1기 신도시의 도시 성장 과정 고찰 - 텍스트마이닝을 이용한 신문기사의 키워드 네트워크 분석을 중심으로 -)

  • Jung, Da-Eun;Kim, Chung Ho
    • Journal of the Korean Regional Science Association
    • /
    • v.39 no.4
    • /
    • pp.91-110
    • /
    • 2023
  • The purpose of this study is to explore urban issues that have arisen in the urban growth process of the 1st New Town for about 34 years since its construction through newspaper articles. For this purpose, newspaper articles related to the 1st New Town were collected using web crawling, and content analysis was conducted based on text mining. The main findings of the study are as follows. First, in the early stages of the construction of the 1st New Town, issues were diverse in the following six sectors: living service facilities, real estate, transportation, urban development and maintenance, safety, and housing supply, but gradually narrowed down to those of real estate and urban development and maintenance. Second, during the new town construction and urban stabilization stages, the network structure centered on 'Seoul' was maintained, which can be explained by the fact that the 1st New Town was geographically located on the outskirts of Seoul, and many articles compared the issues to Seoul. Third, the issue of urban aging appeared from the 10th year after construction, and the discussion on urban reorganization due to urban aging began in earnest from the 30th year after construction. The significance of the study is that it explored the urban issues that occurred throughout the urban growth process of the 1st New Town, and can be used as a basis for preparing a plan to reorganize the 1st New Town.

Analyzing Contextual Polarity of Unstructured Data for Measuring Subjective Well-Being (주관적 웰빙 상태 측정을 위한 비정형 데이터의 상황기반 긍부정성 분석 방법)

  • Choi, Sukjae;Song, Yeongeun;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.83-105
    • /
    • 2016
  • Measuring an individual's subjective wellbeing in an accurate, unobtrusive, and cost-effective manner is a core success factor of the wellbeing support system, which is a type of medical IT service. However, measurements with a self-report questionnaire and wearable sensors are cost-intensive and obtrusive when the wellbeing support system should be running in real-time, despite being very accurate. Recently, reasoning the state of subjective wellbeing with conventional sentiment analysis and unstructured data has been proposed as an alternative to resolve the drawbacks of the self-report questionnaire and wearable sensors. However, this approach does not consider contextual polarity, which results in lower measurement accuracy. Moreover, there is no sentimental word net or ontology for the subjective wellbeing area. Hence, this paper proposes a method to extract keywords and their contextual polarity representing the subjective wellbeing state from the unstructured text in online websites in order to improve the reasoning accuracy of the sentiment analysis. The proposed method is as follows. First, a set of general sentimental words is proposed. SentiWordNet was adopted; this is the most widely used dictionary and contains about 100,000 words such as nouns, verbs, adjectives, and adverbs with polarities from -1.0 (extremely negative) to 1.0 (extremely positive). Second, corpora on subjective wellbeing (SWB corpora) were obtained by crawling online text. A survey was conducted to prepare a learning dataset that includes an individual's opinion and the level of self-report wellness, such as stress and depression. The participants were asked to respond with their feelings about online news on two topics. Next, three data sources were extracted from the SWB corpora: demographic information, psychographic information, and the structural characteristics of the text (e.g., the number of words used in the text, simple statistics on the special characters used). These were considered to adjust the level of a specific SWB. Finally, a set of reasoning rules was generated for each wellbeing factor to estimate the SWB of an individual based on the text written by the individual. The experimental results suggested that using contextual polarity for each SWB factor (e.g., stress, depression) significantly improved the estimation accuracy compared to conventional sentiment analysis methods incorporating SentiWordNet. Even though literature is available on Korean sentiment analysis, such studies only used only a limited set of sentimental words. Due to the small number of words, many sentences are overlooked and ignored when estimating the level of sentiment. However, the proposed method can identify multiple sentiment-neutral words as sentiment words in the context of a specific SWB factor. The results also suggest that a specific type of senti-word dictionary containing contextual polarity needs to be constructed along with a dictionary based on common sense such as SenticNet. These efforts will enrich and enlarge the application area of sentic computing. The study is helpful to practitioners and managers of wellness services in that a couple of characteristics of unstructured text have been identified for improving SWB. Consistent with the literature, the results showed that the gender and age affect the SWB state when the individual is exposed to an identical queue from the online text. In addition, the length of the textual response and usage pattern of special characters were found to indicate the individual's SWB. These imply that better SWB measurement should involve collecting the textual structure and the individual's demographic conditions. In the future, the proposed method should be improved by automated identification of the contextual polarity in order to enlarge the vocabulary in a cost-effective manner.

A Study of the Effects and Risks of Baby-walkers on Motor Development in Human Infants (보행기가 유아 운동발달에 주는 영향에 관한 연구)

  • Lee, Ji Young;Min, Sae Ah;Yu, Sun Hee;Jang, Young Taek
    • Clinical and Experimental Pediatrics
    • /
    • v.46 no.2
    • /
    • pp.122-127
    • /
    • 2003
  • Purpose : Baby-walkers are used by many parents because of the convenience they provide in keeping children occupied, quiet, happy, and in stimulating ambulation. But, these devices have more risks than benefits. Therefore, we performed a study to evaluate the effects of baby-walkers on motor development of human infants according to the hours used in a day, total duration(months), and types of injuries associated with the walkers, and to establish effective methods. Methods : 1,045 questionnaires were filled out by parents who had a baby whose aged between 8 months and 15 months that visited local pediatric clinics and medical centers in Chonju and Iksan from May 1, 2002 to July 31, 2002. They were analyzed in a control group that didn't use babywalkers, a low-user group that used baby-walkers less than 2 hours a day and a high-user group that used them more than 2 hours a day. Results : The mean age of the 1,045 babies whose parents responded to the question investigation was $12.6{\pm}2.4$ months. The number of babies who used the baby-walkers were 811(77.6%). Crawling and walking alone were delayed in the high-user group. The parents who knew the side effects of the baby-walkers totalled 392(48.3%). Conclusion : The findings of this study revealed that many parents didn't know the effects of baby-walkers on motor development in their infants and the risks associated with baby-walkers. Therefore, we should educate parents on the risks of baby-walkers and recommend reducing the use of baby-walkers.

User-Perspective Issue Clustering Using Multi-Layered Two-Mode Network Analysis (다계층 이원 네트워크를 활용한 사용자 관점의 이슈 클러스터링)

  • Kim, Jieun;Kim, Namgyu;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.93-107
    • /
    • 2014
  • In this paper, we report what we have observed with regard to user-perspective issue clustering based on multi-layered two-mode network analysis. This work is significant in the context of data collection by companies about customer needs. Most companies have failed to uncover such needs for products or services properly in terms of demographic data such as age, income levels, and purchase history. Because of excessive reliance on limited internal data, most recommendation systems do not provide decision makers with appropriate business information for current business circumstances. However, part of the problem is the increasing regulation of personal data gathering and privacy. This makes demographic or transaction data collection more difficult, and is a significant hurdle for traditional recommendation approaches because these systems demand a great deal of personal data or transaction logs. Our motivation for presenting this paper to academia is our strong belief, and evidence, that most customers' requirements for products can be effectively and efficiently analyzed from unstructured textual data such as Internet news text. In order to derive users' requirements from textual data obtained online, the proposed approach in this paper attempts to construct double two-mode networks, such as a user-news network and news-issue network, and to integrate these into one quasi-network as the input for issue clustering. One of the contributions of this research is the development of a methodology utilizing enormous amounts of unstructured textual data for user-oriented issue clustering by leveraging existing text mining and social network analysis. In order to build multi-layered two-mode networks of news logs, we need some tools such as text mining and topic analysis. We used not only SAS Enterprise Miner 12.1, which provides a text miner module and cluster module for textual data analysis, but also NetMiner 4 for network visualization and analysis. Our approach for user-perspective issue clustering is composed of six main phases: crawling, topic analysis, access pattern analysis, network merging, network conversion, and clustering. In the first phase, we collect visit logs for news sites by crawler. After gathering unstructured news article data, the topic analysis phase extracts issues from each news article in order to build an article-news network. For simplicity, 100 topics are extracted from 13,652 articles. In the third phase, a user-article network is constructed with access patterns derived from web transaction logs. The double two-mode networks are then merged into a quasi-network of user-issue. Finally, in the user-oriented issue-clustering phase, we classify issues through structural equivalence, and compare these with the clustering results from statistical tools and network analysis. An experiment with a large dataset was performed to build a multi-layer two-mode network. After that, we compared the results of issue clustering from SAS with that of network analysis. The experimental dataset was from a web site ranking site, and the biggest portal site in Korea. The sample dataset contains 150 million transaction logs and 13,652 news articles of 5,000 panels over one year. User-article and article-issue networks are constructed and merged into a user-issue quasi-network using Netminer. Our issue-clustering results applied the Partitioning Around Medoids (PAM) algorithm and Multidimensional Scaling (MDS), and are consistent with the results from SAS clustering. In spite of extensive efforts to provide user information with recommendation systems, most projects are successful only when companies have sufficient data about users and transactions. Our proposed methodology, user-perspective issue clustering, can provide practical support to decision-making in companies because it enhances user-related data from unstructured textual data. To overcome the problem of insufficient data from traditional approaches, our methodology infers customers' real interests by utilizing web transaction logs. In addition, we suggest topic analysis and issue clustering as a practical means of issue identification.

Revisiting the cause of unemployment problem in Korea's labor market: The job seeker's interests-based topic analysis (취업준비생 토픽 분석을 통한 취업난 원인의 재탐색)

  • Kim, Jung-Su;Lee, Suk-Jun
    • Management & Information Systems Review
    • /
    • v.35 no.1
    • /
    • pp.85-116
    • /
    • 2016
  • The present study aims to explore the causes of employment difficulty on the basis of job applicant's interest from P-E (person-environment) fit perspective. Our approach relied on a textual analytic method to reveal insights from their situational interests in a job search during the change of labor market. Thus, to investigate the type of major interests and psychological responses, user-generated texts in a social community were collected for analysis between January 1, 2013 through December 31, 2015 by crawling the online-community in regard to job seeking and sharing information and opinions. The results of topic analysis indicated user's primary interests were divided into four types: perception of vocation expectation, employment pre-preparation behaviors, perception of labor market, and job-seeking stress. Specially, job applicants put mainly concerns of monetary reward and a form of employment, rather than their work values or career exploration, thus youth job applicants expressed their psychological responses using contextualized language (e.g., slang, vulgarisms) for projecting their unstable state under uncertainty in response to environmental changes. Additionally, they have perceived activities in the restricted preparation (e.g., certification, English exam) as determinant factors for success in employment and suffered form job-seeking stress. On the basis of these findings, current unemployment matters are totally attributed to the absence of pursing the value of vocation and job in individuals, organizations, and society. Concretely, job seekers are preoccupied with occupational prestige in social aspect and have undecided vocational value. On the other hand, most companies have no perception of the importance of human resources and have overlooked the needs for proper work environment development in respect of stimulating individual motivation. The attempt in this study to reinterpret the effect of environment as for classifying job applicant's interests in reference to linguistic and psychological theories not only helps conduct a more comprehensive meaning for understanding social matters, but guides new directions for future research on job applicant's psychological factors (e.g., attitudes, motivation) using topic analysis.

  • PDF