• Title/Summary/Keyword: web crawling

Search Result 177, Processing Time 0.025 seconds

Development of Dataset Items for Commercial Space Design Applying AI

  • Jung Hwa SEO;Segeun CHUN;Ki-Pyeong, KIM
    • Korean Journal of Artificial Intelligence
    • /
    • v.11 no.1
    • /
    • pp.25-29
    • /
    • 2023
  • In this paper, the purpose is to create a standard of AI training dataset type for commercial space design. As the market size of the field of space design continues to increase and the time spent increases indoors after COVID-19, interest in space is expanding throughout society. In addition, more and more consumers are getting used to the digital environment. Therefore, If you identify trends and preemptively propose the atmosphere and specifications that customers require quickly and easily, you can increase customer trust and conduct effective sales. As for the data set type, commercial districts were divided into a total of 8 categories, and images that could be processed were derived by refining 4,009,30MB JPG format images collected through web crawling. Then, by performing bounding and labeling operations, we developed a 'Dataset for AI Training' of 3,356 commercial space image data in CSV format with a size of 2.08MB. Through this study, elements of spatial images such as place type, space classification, and furniture can be extracted and used when developing AI algorithms, and it is expected that images requested by clients can be easily and quickly collected through spatial image input information.

Analysis of Media Trends and Social Perceptions on Nursing Law Legislation (간호법 제정에 대한 언론 동향 및 사회적 인식 분석)

  • Lee, Seung-Hee;Joo, Min-Ho
    • Journal of Korean Academy of Nursing
    • /
    • v.53 no.4
    • /
    • pp.439-452
    • /
    • 2023
  • Purpose: This study aimed to derive considerations for the enactment of nursing law by analyzing the trends and social perceptions of nursing law mentioned in major daily newspapers, cafes, and blogs. Methods: Main texts and comments that included nursing law as a keyword were collected from major daily news and online postings from January 2021 to August 2022. The data collected through web crawling were analyzed using a TousFlux program used for big data analysis. Results: During the period of study, the awareness level around nursing law enactment increased. In particular, public concern over nursing law enactment intensified due to the two political parties' policy pledges related to nursing law in January 2022 and the failure to introduce the nursing law to the national assembly judiciary committee in May 2022. Except in December 2021, public perception of nursing law enactment was generally favorable, with public opinion tilting more in favor of than against enactment. Conclusion: Public opinion should be considered when drafting and implementing the nursing law to make it easier for the people to understand what the law constitutes. In addition, it is necessary to pay attention to and continuously promote the relationship between medical care and nursing in the nursing law system of developed nations. Lastly, nursing law enactment can enhance nurses' retention intention and provide a sense of efficacy to medical services.

A Classification Model for Predicting the Injured Body Part in Construction Accidents in Korea

  • Lim, Jiseon;Cho, Sungjin;Kang, Sanghyeok
    • International conference on construction engineering and project management
    • /
    • 2022.06a
    • /
    • pp.230-237
    • /
    • 2022
  • It is difficult to predict industrial accidents in the construction industry because many accident factors, such as human-related factors and environment-related factors, affect the accidents. Many studies have analyzed the severity of injuries and types of accidents; however, there were few studies on the prediction of injured body parts. This study aims to develop a classification model to predict the part of the injured body based on accident-related factors. Construction accident cases from June 2018 to July 2021 provided by the Korea Construction Safety Management Integrated Information were collected through web crawling and then preprocessed. A naïve Bayes classifier, one of the supervised learning algorithms, was employed to construct a classification model of the injured body part, which has four categories: 1) torso, 2) upper extremity, 3) head, and 4) lower extremity. The predictor variables are accident type, type of work, facility type, injury source, and activity type. As a result, the average accuracy for each injured body part was 50.4%. The accuracy of the upper extremity and lower extremity was relatively higher than the cases of the torso and head. Unlike the other classifications, such as spam mail filtering, a naïve Bayes classifier does not provide a good classification performance in construction accidents. The reasons are discussed in the study. Based on the results of this study, more detailed guidelines for construction safety management can be provided, which help establish safety measures at the construction site.

  • PDF

Assessing likelihood of drought impact occurrence in South korea through machine learning (머신러닝 기법을 통한 우리나라 가뭄 영향 발생 가능성 평가)

  • Seo, Jungho;Kim, Yeonjoo
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2021.06a
    • /
    • pp.77-77
    • /
    • 2021
  • 가뭄은 사회·경제적으로 매우 큰 피해를 주는 자연재해이며, 그 시작과 발생 지역을 정확하게 예측하는 데 어려운 문제가 있다. 이에 수문 분야에서는 가뭄에 영향을 미치는 수문·기상인자들을 이용하여 다양한 가뭄지수를 개발하였고 이를 활용하여 가뭄 현상을 모니터링하고 예측 및 전망하는데 다양한 노력을 기울이고 있다. 하지만 가뭄지수들은 실제 가뭄이 어떠한 형태로 발생하는지 파악하기에 많은 한계점을 가지고 있다. 이에 최근 들어 미국과 유럽에서는 실제 농업, 환경, 에너지 등과 같은 다양한 분야에 걸쳐 가뭄 피해로 인해 생기는 가뭄 영향을 보다 체계적이고 상세한 데이터 인벤토리로 구축하고 가뭄지수와의 상관관계, 회귀분석과 같은 연구를 통해 가뭄 영향 예측을 시도하고 있다. 따라서 본 연구에서는 보고서, 데이터베이스, 웹 크롤링(Web-Crawling)을 통한 뉴스 기사 등과 같은 자료를 수집하여 국내 가뭄 영향 인벤토리를 구축하였다. 또한 수문 분야에 널리 사용되고 있는 가뭄지수인 표준 강수 증발산량지수 SPEI(Standardized Precipitation-Evapotranspiration Index)를 기반으로 지역에 따른 가뭄 영향을 예측하기 위해 최근 로지스틱 회귀모형, Random forest, Support vector machine, XGBoost 등의 다양한 머신러닝 기법을 적용하였다. 각 모형의 성능을 Receiver Operating Characteristic(ROC) 곡선을 통해 평가하여 가뭄 영향 예측에 적절한 머신러닝 기법을 제시하였다. 본 연구 결과를 통해 텍스트 기반의 가뭄 영향 자료와 머신러닝 기법을 통한 가뭄 영향 예측 방법론은 가뭄 재난 관리에 유용한 정보를 제공할 수 있다.

  • PDF

Analysis of Shipping and Logistics News Articles using Topic Modeling (토픽모델링을 활용한 해운물류 뉴스 분석)

  • Hee-Young Yoon;Il-Youp Kwak
    • Korea Trade Review
    • /
    • v.46 no.4
    • /
    • pp.61-76
    • /
    • 2021
  • This study focuses on three logistics-related news (Logistics Newspaper, Korea Shipping Gadget, and Korea Shipping Newspaper) in order to present changes in logistics issues, centering on Corona 19, which has recently had the greatest impact in the world. For data collection, two-year news articles in 2019 and 2020 (title, article, content, date, article classification, article URL) were collected through web crawling (using Python's BeautifulSoup, requests module) on the homepages of three representative logistics-related media companies. As for the data analysis methods, fundamental statistical analysis, Latent Dirichlet Allocation (LDA) for topic modeling, and Scattertext were performed. The analysis results were as follows. First, among the three news media related to logistics, the Korea Shipping Newspaper was carrying out the most active media activities. Second, through topic modeling with LDA, eight logistics-related topics were identified, and keywords and significant issues of each topic were presented. Third, the keywords were visually expressed through Scattertext. This is the first study to present changes in the logistics field, focusing on articles from representative logistics-related media in 2019 and 2020. In particular, 2019 and 2020 can be divided into before and after the outbreak of Corona 19, which has had a great impact not only on the logistics field but also on our lives as a whole. For future work, a multi-faceted approach is required, such as comparative studies of logistics issues between countries or presenting implications based on long-term time-series articles.

A Study on the Low(No)-Code Platform Based on Web Crawling and NLP for Providing Framework-Specific Code (프레임워크 맞춤형 코드 제공을 위한 웹 크롤링과 NLP 기반 노코드 플랫폼 연구)

  • Chae-Rim Yoon;Song-Ie Kim;In-Bin Baik;Jin-Hwan Woo;Jae-Hyeong Song;Gi-Young Beak
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.11a
    • /
    • pp.945-946
    • /
    • 2023
  • 4차 산업혁명과 코로나19 영향으로 개발자 수요가 급증하며, 노코드 및 로우코드 플랫폼과 자연어 처리 기반 인공지능이 주목받고 있다. 본 연구는 프로그래밍 접근성 향상을 위한 노코드 플랫폼을 탐구하며, 사용자가 UI를 통해 직관적으로 프로젝트를 구축할 수 있는 설계 방식을 제시한다. 본 연구에서는 웹 크롤링과 자연어 처리 모델 학습에 기반한 아키텍처와 방향성을 제시한다. 사용자는 화면을 구성하고 프레임워크 선택 후 프로젝트를 간단하게 구축할 수 있다. 이 연구는 전문 지식 없이도 소프트웨어 개발에 쉽게 접근할 수 있는 방법론을 제시하며, 접근성과 포용성 강화에 기여한다.

Gift-giving Behaviors via SNS Mobile App: An Exploratory Study of Fashion Products

  • Ji Yoon Kim;Jiyeon Lee;Kyu-Hye Lee
    • Journal of Fashion Business
    • /
    • v.27 no.6
    • /
    • pp.110-123
    • /
    • 2023
  • As social distancing strengthened after the COVID-19 incident, people looked for things they could do alone. Additionally, as people have more financial resources, they purchase products they had previously considered purchasing, and the phenomenon of giving gifts to oneself has also appeared. Accordingly, this study analyzed fashion product reviews of KakaoTalk Gift, the service to exchange gift via SNS mobile app, to discover the phenomenon of self-gifting and the differences from interpersonal-gifting. For post-hoc data, in collected 18,354 pieces after excluding unnecessary data using a Python-based web crawling technique. The self-gifting behavior of KakaoTalk Gift different from the previous study for self-gift. Regardless of the gift-giving contexts, it determines that most self-gift products are material items. There are differences in product types and price levels when choosing gifts for others and oneself. As a self-gift, people typically buy luxury jewelry and branded bags/wallets to wear and show off. As interpersonal, among fashion products, people usually buy beauty products that reflect less personal tastes. When gift-giving to others, people buy products to appropriate prices to reduce the burden on both. When gift-giving to oneself, people buy wanted products regardless of the price. This study is significant because it suggests a new direction in self-gift research by limited online places to give gifts.

A Study of Ginseng Culture within 'Joseonwangjosilok' through Textual Frequency Analysis

  • Mi-Hye Kim
    • CELLMED
    • /
    • v.14 no.2
    • /
    • pp.2.1-2.10
    • /
    • 2024
  • Through big data analysis of the 'Joseonwangjosilok', this study examines the perception of ginseng among the ruling class and its utilization during the Joseon era. It aims to provide foundational data for the development of ginseng into a high-value cultural commodity. The focus of this research, the Joseonwangjosilok, comprises 1,968 volumes in 948 books, spanning a record of 518 years. Data was collected through web crawling on the website of the National Institute of Korean History, followed by frequency analysis of significant words. To assess the interest in ginseng across the reigns of 27 kings during the Joseon era, ginseng frequency records were adjusted based on years in power and the number of articles, creating an interest index for comparative rankings across reigns. Analysis revealed higher interest in ginseng during the reigns of King Jeongjo and King Yeongjo in the 18th century, King Sunjo in the 19th century, King Sejong in the 15th century, King Sukjong in the 17th century, and King Gojong in the 19th century. Examining the temporal emergence and changes in ginseng during the Joseon era, general ginseng types like insam and sansam had the highest frequency in the 15th century. It appears that Korea adeptly utilized ceremonial goods in diplomatic relations with China and Japan, meeting the demand for ginseng from their royal and aristocratic societies. Processed ginseng varieties such as hongsam and posam, along with traded and taxed ginseng, showed peak frequency in the 18th century. This coincided with increased cultivation, allowing a higher supply and fostering the development of ginseng processing technologies like hongsam.

Keywords Analysis of Clothing Materials in Consumer Reviews Using Big Data Text Mining (빅데이터 텍스트 마이닝을 활용한 소비자 리뷰에서의 의류 소재 키워드 분석)

  • Gaeun Kang;Jiwon Park;Shinjung Yoo
    • Journal of the Korean Society of Clothing and Textiles
    • /
    • v.48 no.4
    • /
    • pp.729-743
    • /
    • 2024
  • This research explores consumer preferences for materials in different clothing product categories, using web-crawling and text mining techniques. Specifically, the study focuses on the material-related terms found in consumer reviews across three distinct product categories: functional clothing, formal shirts, and knit sweaters. Top-selling products within each category were identified on the Naver Shopping website based on the volume of reviews, and the four most-reviewed products were selected. Six hundred reviews per product were analyzed using the Textom big-data analysis software to determine the frequency of material-related mentions and word associations. The analysis utilized two comparative metrics: product category and usage duration. Our findings reveal notable variations in the material preferences mentioned by consumers across different product categories. The study suggests a need to re-evaluate existing standardized review criteria to better reflect consumer interests specific to each product category. Additionally, an increase in material-related terms in reviews over one month indicates the potential importance of extending the duration of product reviews to enhance the accuracy of information that reflects longer-term consumer experiences with material quality.

Korea National College of Agriculture and Fisheries in Naver News by Web Crolling : Based on Keyword Analysis and Semantic Network Analysis (웹 크롤링에 의한 네이버 뉴스에서의 한국농수산대학 - 키워드 분석과 의미연결망분석 -)

  • Joo, J.S.;Lee, S.Y.;Kim, S.H.;Park, N.B.
    • Journal of Practical Agriculture & Fisheries Research
    • /
    • v.23 no.2
    • /
    • pp.71-86
    • /
    • 2021
  • This study was conducted to find information on the university's image from words related to 'Korea National College of Agriculture and Fisheries (KNCAF)' in Naver News. For this purpose, word frequency analysis, TF-IDF evaluation and semantic network analysis were performed using web crawling technology. In word frequency analysis, 'agriculture', 'education', 'support', 'farmer', 'youth', 'university', 'business', 'rural', 'CEO' were important words. In the TF-IDF evaluation, the key words were 'farmer', 'dron', 'agricultural and livestock food department', 'Jeonbuk', 'young farmer', 'agriculture', 'Chonju', 'university', 'device', 'spreading'. In the semantic network analysis, the Bigrams showed high correlations in the order of 'youth' - 'farmer', 'digital' - 'agriculture', 'farming' - 'settlement', 'agriculture' - 'rural', 'digital' - 'turnover'. As a result of evaluating the importance of keywords as five central index, 'agriculture' ranked first. And the keywords in the second place of the centrality index were 'farmers' (Cc, Cb), 'education' (Cd, Cp) and 'future' (Ce). The sperman's rank correlation coefficient by centrality index showed the most similar rank between Degree centrality and Pagerank centrality. The KNCAF articles of Naver News were used as important words such as 'agriculture', 'education', 'support', 'farmer', 'youth' in terms of word frequency. However, in the evaluation including document frequency, the words such as 'farmer', 'dron', 'Ministry of Agriculture, Food and Rural Affairs', 'Jeonbuk', and 'young farmers' were found to be key words. The centrality analysis considering the network connectivity between words was suitable for evaluation by Cd and Cp. And the words with strong centrality were 'agriculture', 'education', 'future', 'farmer', 'digital', 'support', 'utilization'.