• 제목/요약/키워드: Web-crawling

검색결과 177건 처리시간 0.029초

Development of Dataset Items for Commercial Space Design Applying AI

  • Jung Hwa SEO;Segeun CHUN;Ki-Pyeong, KIM
    • 한국인공지능학회지
    • /
    • 제11권1호
    • /
    • pp.25-29
    • /
    • 2023
  • In this paper, the purpose is to create a standard of AI training dataset type for commercial space design. As the market size of the field of space design continues to increase and the time spent increases indoors after COVID-19, interest in space is expanding throughout society. In addition, more and more consumers are getting used to the digital environment. Therefore, If you identify trends and preemptively propose the atmosphere and specifications that customers require quickly and easily, you can increase customer trust and conduct effective sales. As for the data set type, commercial districts were divided into a total of 8 categories, and images that could be processed were derived by refining 4,009,30MB JPG format images collected through web crawling. Then, by performing bounding and labeling operations, we developed a 'Dataset for AI Training' of 3,356 commercial space image data in CSV format with a size of 2.08MB. Through this study, elements of spatial images such as place type, space classification, and furniture can be extracted and used when developing AI algorithms, and it is expected that images requested by clients can be easily and quickly collected through spatial image input information.

간호법 제정에 대한 언론 동향 및 사회적 인식 분석 (Analysis of Media Trends and Social Perceptions on Nursing Law Legislation)

  • 이승희;주민호
    • 대한간호학회지
    • /
    • 제53권4호
    • /
    • pp.439-452
    • /
    • 2023
  • Purpose: This study aimed to derive considerations for the enactment of nursing law by analyzing the trends and social perceptions of nursing law mentioned in major daily newspapers, cafes, and blogs. Methods: Main texts and comments that included nursing law as a keyword were collected from major daily news and online postings from January 2021 to August 2022. The data collected through web crawling were analyzed using a TousFlux program used for big data analysis. Results: During the period of study, the awareness level around nursing law enactment increased. In particular, public concern over nursing law enactment intensified due to the two political parties' policy pledges related to nursing law in January 2022 and the failure to introduce the nursing law to the national assembly judiciary committee in May 2022. Except in December 2021, public perception of nursing law enactment was generally favorable, with public opinion tilting more in favor of than against enactment. Conclusion: Public opinion should be considered when drafting and implementing the nursing law to make it easier for the people to understand what the law constitutes. In addition, it is necessary to pay attention to and continuously promote the relationship between medical care and nursing in the nursing law system of developed nations. Lastly, nursing law enactment can enhance nurses' retention intention and provide a sense of efficacy to medical services.

A Classification Model for Predicting the Injured Body Part in Construction Accidents in Korea

  • Lim, Jiseon;Cho, Sungjin;Kang, Sanghyeok
    • 국제학술발표논문집
    • /
    • The 9th International Conference on Construction Engineering and Project Management
    • /
    • pp.230-237
    • /
    • 2022
  • It is difficult to predict industrial accidents in the construction industry because many accident factors, such as human-related factors and environment-related factors, affect the accidents. Many studies have analyzed the severity of injuries and types of accidents; however, there were few studies on the prediction of injured body parts. This study aims to develop a classification model to predict the part of the injured body based on accident-related factors. Construction accident cases from June 2018 to July 2021 provided by the Korea Construction Safety Management Integrated Information were collected through web crawling and then preprocessed. A naïve Bayes classifier, one of the supervised learning algorithms, was employed to construct a classification model of the injured body part, which has four categories: 1) torso, 2) upper extremity, 3) head, and 4) lower extremity. The predictor variables are accident type, type of work, facility type, injury source, and activity type. As a result, the average accuracy for each injured body part was 50.4%. The accuracy of the upper extremity and lower extremity was relatively higher than the cases of the torso and head. Unlike the other classifications, such as spam mail filtering, a naïve Bayes classifier does not provide a good classification performance in construction accidents. The reasons are discussed in the study. Based on the results of this study, more detailed guidelines for construction safety management can be provided, which help establish safety measures at the construction site.

  • PDF

머신러닝 기법을 통한 우리나라 가뭄 영향 발생 가능성 평가 (Assessing likelihood of drought impact occurrence in South korea through machine learning)

  • 서정호;김연주
    • 한국수자원학회:학술대회논문집
    • /
    • 한국수자원학회 2021년도 학술발표회
    • /
    • pp.77-77
    • /
    • 2021
  • 가뭄은 사회·경제적으로 매우 큰 피해를 주는 자연재해이며, 그 시작과 발생 지역을 정확하게 예측하는 데 어려운 문제가 있다. 이에 수문 분야에서는 가뭄에 영향을 미치는 수문·기상인자들을 이용하여 다양한 가뭄지수를 개발하였고 이를 활용하여 가뭄 현상을 모니터링하고 예측 및 전망하는데 다양한 노력을 기울이고 있다. 하지만 가뭄지수들은 실제 가뭄이 어떠한 형태로 발생하는지 파악하기에 많은 한계점을 가지고 있다. 이에 최근 들어 미국과 유럽에서는 실제 농업, 환경, 에너지 등과 같은 다양한 분야에 걸쳐 가뭄 피해로 인해 생기는 가뭄 영향을 보다 체계적이고 상세한 데이터 인벤토리로 구축하고 가뭄지수와의 상관관계, 회귀분석과 같은 연구를 통해 가뭄 영향 예측을 시도하고 있다. 따라서 본 연구에서는 보고서, 데이터베이스, 웹 크롤링(Web-Crawling)을 통한 뉴스 기사 등과 같은 자료를 수집하여 국내 가뭄 영향 인벤토리를 구축하였다. 또한 수문 분야에 널리 사용되고 있는 가뭄지수인 표준 강수 증발산량지수 SPEI(Standardized Precipitation-Evapotranspiration Index)를 기반으로 지역에 따른 가뭄 영향을 예측하기 위해 최근 로지스틱 회귀모형, Random forest, Support vector machine, XGBoost 등의 다양한 머신러닝 기법을 적용하였다. 각 모형의 성능을 Receiver Operating Characteristic(ROC) 곡선을 통해 평가하여 가뭄 영향 예측에 적절한 머신러닝 기법을 제시하였다. 본 연구 결과를 통해 텍스트 기반의 가뭄 영향 자료와 머신러닝 기법을 통한 가뭄 영향 예측 방법론은 가뭄 재난 관리에 유용한 정보를 제공할 수 있다.

  • PDF

토픽모델링을 활용한 해운물류 뉴스 분석 (Analysis of Shipping and Logistics News Articles using Topic Modeling)

  • 윤희영;곽일엽
    • 무역학회지
    • /
    • 제46권4호
    • /
    • pp.61-76
    • /
    • 2021
  • This study focuses on three logistics-related news (Logistics Newspaper, Korea Shipping Gadget, and Korea Shipping Newspaper) in order to present changes in logistics issues, centering on Corona 19, which has recently had the greatest impact in the world. For data collection, two-year news articles in 2019 and 2020 (title, article, content, date, article classification, article URL) were collected through web crawling (using Python's BeautifulSoup, requests module) on the homepages of three representative logistics-related media companies. As for the data analysis methods, fundamental statistical analysis, Latent Dirichlet Allocation (LDA) for topic modeling, and Scattertext were performed. The analysis results were as follows. First, among the three news media related to logistics, the Korea Shipping Newspaper was carrying out the most active media activities. Second, through topic modeling with LDA, eight logistics-related topics were identified, and keywords and significant issues of each topic were presented. Third, the keywords were visually expressed through Scattertext. This is the first study to present changes in the logistics field, focusing on articles from representative logistics-related media in 2019 and 2020. In particular, 2019 and 2020 can be divided into before and after the outbreak of Corona 19, which has had a great impact not only on the logistics field but also on our lives as a whole. For future work, a multi-faceted approach is required, such as comparative studies of logistics issues between countries or presenting implications based on long-term time-series articles.

프레임워크 맞춤형 코드 제공을 위한 웹 크롤링과 NLP 기반 노코드 플랫폼 연구 (A Study on the Low(No)-Code Platform Based on Web Crawling and NLP for Providing Framework-Specific Code)

  • 윤채림;김송이;백인빈;우진환;송재형;백기영
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2023년도 추계학술발표대회
    • /
    • pp.945-946
    • /
    • 2023
  • 4차 산업혁명과 코로나19 영향으로 개발자 수요가 급증하며, 노코드 및 로우코드 플랫폼과 자연어 처리 기반 인공지능이 주목받고 있다. 본 연구는 프로그래밍 접근성 향상을 위한 노코드 플랫폼을 탐구하며, 사용자가 UI를 통해 직관적으로 프로젝트를 구축할 수 있는 설계 방식을 제시한다. 본 연구에서는 웹 크롤링과 자연어 처리 모델 학습에 기반한 아키텍처와 방향성을 제시한다. 사용자는 화면을 구성하고 프레임워크 선택 후 프로젝트를 간단하게 구축할 수 있다. 이 연구는 전문 지식 없이도 소프트웨어 개발에 쉽게 접근할 수 있는 방법론을 제시하며, 접근성과 포용성 강화에 기여한다.

Gift-giving Behaviors via SNS Mobile App: An Exploratory Study of Fashion Products

  • Ji Yoon Kim;Jiyeon Lee;Kyu-Hye Lee
    • 패션비즈니스
    • /
    • 제27권6호
    • /
    • pp.110-123
    • /
    • 2023
  • As social distancing strengthened after the COVID-19 incident, people looked for things they could do alone. Additionally, as people have more financial resources, they purchase products they had previously considered purchasing, and the phenomenon of giving gifts to oneself has also appeared. Accordingly, this study analyzed fashion product reviews of KakaoTalk Gift, the service to exchange gift via SNS mobile app, to discover the phenomenon of self-gifting and the differences from interpersonal-gifting. For post-hoc data, in collected 18,354 pieces after excluding unnecessary data using a Python-based web crawling technique. The self-gifting behavior of KakaoTalk Gift different from the previous study for self-gift. Regardless of the gift-giving contexts, it determines that most self-gift products are material items. There are differences in product types and price levels when choosing gifts for others and oneself. As a self-gift, people typically buy luxury jewelry and branded bags/wallets to wear and show off. As interpersonal, among fashion products, people usually buy beauty products that reflect less personal tastes. When gift-giving to others, people buy products to appropriate prices to reduce the burden on both. When gift-giving to oneself, people buy wanted products regardless of the price. This study is significant because it suggests a new direction in self-gift research by limited online places to give gifts.

A Study of Ginseng Culture within 'Joseonwangjosilok' through Textual Frequency Analysis

  • Mi-Hye Kim
    • 셀메드
    • /
    • 제14권2호
    • /
    • pp.2.1-2.10
    • /
    • 2024
  • Through big data analysis of the 'Joseonwangjosilok', this study examines the perception of ginseng among the ruling class and its utilization during the Joseon era. It aims to provide foundational data for the development of ginseng into a high-value cultural commodity. The focus of this research, the Joseonwangjosilok, comprises 1,968 volumes in 948 books, spanning a record of 518 years. Data was collected through web crawling on the website of the National Institute of Korean History, followed by frequency analysis of significant words. To assess the interest in ginseng across the reigns of 27 kings during the Joseon era, ginseng frequency records were adjusted based on years in power and the number of articles, creating an interest index for comparative rankings across reigns. Analysis revealed higher interest in ginseng during the reigns of King Jeongjo and King Yeongjo in the 18th century, King Sunjo in the 19th century, King Sejong in the 15th century, King Sukjong in the 17th century, and King Gojong in the 19th century. Examining the temporal emergence and changes in ginseng during the Joseon era, general ginseng types like insam and sansam had the highest frequency in the 15th century. It appears that Korea adeptly utilized ceremonial goods in diplomatic relations with China and Japan, meeting the demand for ginseng from their royal and aristocratic societies. Processed ginseng varieties such as hongsam and posam, along with traded and taxed ginseng, showed peak frequency in the 18th century. This coincided with increased cultivation, allowing a higher supply and fostering the development of ginseng processing technologies like hongsam.

빅데이터 텍스트 마이닝을 활용한 소비자 리뷰에서의 의류 소재 키워드 분석 (Keywords Analysis of Clothing Materials in Consumer Reviews Using Big Data Text Mining)

  • 강가은;박지원;유신정
    • 한국의류학회지
    • /
    • 제48권4호
    • /
    • pp.729-743
    • /
    • 2024
  • This research explores consumer preferences for materials in different clothing product categories, using web-crawling and text mining techniques. Specifically, the study focuses on the material-related terms found in consumer reviews across three distinct product categories: functional clothing, formal shirts, and knit sweaters. Top-selling products within each category were identified on the Naver Shopping website based on the volume of reviews, and the four most-reviewed products were selected. Six hundred reviews per product were analyzed using the Textom big-data analysis software to determine the frequency of material-related mentions and word associations. The analysis utilized two comparative metrics: product category and usage duration. Our findings reveal notable variations in the material preferences mentioned by consumers across different product categories. The study suggests a need to re-evaluate existing standardized review criteria to better reflect consumer interests specific to each product category. Additionally, an increase in material-related terms in reviews over one month indicates the potential importance of extending the duration of product reviews to enhance the accuracy of information that reflects longer-term consumer experiences with material quality.

웹 크롤링에 의한 네이버 뉴스에서의 한국농수산대학 - 키워드 분석과 의미연결망분석 - (Korea National College of Agriculture and Fisheries in Naver News by Web Crolling : Based on Keyword Analysis and Semantic Network Analysis)

  • 주진수;이소영;김승희;박노복
    • 현장농수산연구지
    • /
    • 제23권2호
    • /
    • pp.71-86
    • /
    • 2021
  • 빅데이터 분석기술인 웹 크롤링 기술을 이용하여 네이버 뉴스 데이터 내에 담겨 있는 '한농대' 에 대한 이미지 단어를 추출하였다. 뉴스 기사에서 언급된 빈도에 따라 중요한 단어로 평가는 단어빈도 분석에서는 청년농업인을 육성하는 한농대의 특성을 잘 설명하는 '농업', '교육', '지원', '농업인', '청년', '대학', '사업', '농촌', '대표' 등의 단어가 자주 사용되는 것으로 나타났다. 또한 '디지털', '스마트', '드론', '졸업생', '창업', '새만금', '교육과정' 등 디지털 농업 전문 인재를 육성하기 위한 학교의 교육, 지원, 비전 등과 관련한 단어들이 추출되었다. 모든 기사 데이터의 단어 빈도(TF) 및 역 문서 빈도(IDF)를 이용한 TF-IDF 가중치의 전체 순위는 '농업인', '드론', '농림축산식품부', '전북', '청년농업인', '농업', '전주', '대학', '장치', '파종' 등의 단어가 한농대와 관련된 뉴스 기사에서 중요한 핵심어 역할을 하는 것으로 나타났다. 단어 빈도에서 '드론', '농림축산식품부', '전북', '청년농업인', '전주', '장치, '파종' 등은 순위가 매우 낮았으나 TF-IDF 가중치 순위에서는 한농대를 표현하는 핵심어로 나타났다. TF-IDF 평가에서 '교육', '지원', '청년', '사업', '농촌' 등의 키워드는 단어빈도가 높으면서 많은 문서에서 자주 등장하는 키워드로서 핵심어 역할은 크지 않은 것으로 나타났다. 단어 간 연계성을 파악하기 위한 의미연결망 분석에서 추출한 바이그램은 '청년'-'농업인', '디지털'-'농업', '영농'-'정착', '농업'-'농촌', '디지털'-'전환' 등의 순으로 빈도가 높게 나타났다. 중심성 지표로 키워드의 영향력을 평가한 결과 모든 지표에서 '농업'이 1위로 나타났으며, 2위에는 '농업인'(근접 중심성, 매개 중심성), '교육'(연결 중심성, 페이지랭크 중심성) 및 '미래'(고유벡터 중심성)으로 나타났다. 스피어먼 순위 상관계수에 의한 중심성 지표별 키워드의 순위의 유사성은 연결 중심성과 페이지랭크 중심성이 0.89 전후의 가장 높은 상관관계를 보였다. 이상으로 네이버 뉴스의 한농대 관련 기사에서 단어 빈도로 보면 '농업', '교육', '지원', '농업인', '청년', '대학', '사업', '농촌', '대표' 등이 중요한 단어로 평가되었으나, 문서빈도를 함께 고려한 평가에서는 '농업인', '드론', '농림축산식품부', '전북', '청년농업인', '농업', '전주', '대학', '장치', '파종' 등의 단어가 핵심어 역할을 하는 것으로 나타났다. 한편 단어나 문서의 빈도가 아니라 단어 간 네트워크 연계성을 고려한 중심성 분석에서는 연결 중심성과 페이지랭크 중심성에 의한 평가가 적합한 것으로 나타났으며, '농업', '교육', '미래', '농업인', '디지털', '지원', '활용' 등이 중심성이 강한 단어로 나타났다.