• Title/Summary/Keyword: Web Crawling(Scraping)

Search Result 4, Processing Time 0.018 seconds

An Implementation and Performance Evaluation of Fast Web Crawler with Python

  • Kim, Cheong Ghil
    • Journal of the Semiconductor & Display Technology
    • /
    • v.18 no.3
    • /
    • pp.140-143
    • /
    • 2019
  • The Internet has been expanded constantly and greatly such that we are having vast number of web pages with dynamic changes. Especially, the fast development of wireless communication technology and the wide spread of various smart devices enable information being created at speed and changed anywhere, anytime. In this situation, web crawling, also known as web scraping, which is an organized, automated computer system for systematically navigating web pages residing on the web and for automatically searching and indexing information, has been inevitably used broadly in many fields today. This paper aims to implement a prototype web crawler with Python and to improve the execution speed using threads on multicore CPU. The results of the implementation confirmed the operation with crawling reference web sites and the performance improvement by evaluating the execution speed on the different thread configurations on multicore CPU.

Smart Synthetic Path Search System for Prevention of Hazardous Chemical Accidents and Analysis of Reaction Risk (반응 위험성분석 및 사고방지를 위한 스마트 합성경로 탐색시스템)

  • Jeong, Joonsoo;Kim, Chang Won;Kwak, Dongho;Shin, Dongil
    • Korean Chemical Engineering Research
    • /
    • v.57 no.6
    • /
    • pp.781-789
    • /
    • 2019
  • There are frequent accidents by chemicals during laboratory experiments and pilot plant and reactor operations. It is necessary to find and comprehend relevant information to prevent accidents before starting synthesis experiments. In the process design stage, reaction information is also necessary to prevent runaway reactions. Although there are various sources available for synthesis information, including the Internet, it takes long time to search and is difficult to choose the right path because the substances used in each synthesis method are different. In order to solve these problems, we propose an intelligent synthetic path search system to help researchers shorten the search time for synthetic paths and identify hazardous intermediates that may exist on paths. The system proposed in this study automatically updates the database by collecting information existing on the Internet through Web scraping and crawling using Selenium, a Python package. Based on the depth-first search, the path search performs searches based on the target substance, distinguishes hazardous chemical grades and yields, etc., and suggests all synthetic paths within a defined limit of path steps. For the benefit of each research institution, researchers can register their private data and expand the database according to the format type. The system is being released as open source for free use. The system is expected to find a safer way and help prevent accidents by supporting researchers referring to the suggested paths.

A Study on Big Data Processing Technology Based on Open Source for Expansion of LIMS (실험실정보관리시스템의 확장을 위한 오픈 소스 기반의 빅데이터 처리 기술에 관한 연구)

  • Kim, Soon-Gohn
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.14 no.2
    • /
    • pp.161-167
    • /
    • 2021
  • Laboratory Information Management System(LIMS) is a centralized database for storing, processing, retrieving, and analyzing laboratory data, and refers to a computer system or system specially designed for laboratories performing inspection, analysis, and testing tasks. In particular, LIMS is equipped with a function to support the operation of the laboratory, and it requires workflow management or data tracking support. In this paper, we collect data on websites and various channels using crawling technology, one of the automated big data collection technologies for the operation of the laboratory. Among the collected test methods and contents, useful test methods and contents useful that the tester can utilize are recommended. In addition, we implement a complementary LIMS platform capable of verifying the collection channel by managing the feedback.

A Study on Artificial Intelligence Education Design for Business Major Students

  • PARK, So-Hyun;SUH, Eung-Kyo
    • The Journal of Industrial Distribution & Business
    • /
    • v.12 no.8
    • /
    • pp.21-32
    • /
    • 2021
  • Purpose: With the advent of the era of the 4th industrial revolution, called a new technological revolution, the necessity of fostering future talents equipped with AI utilization capabilities is emerging. However, there is a lack of research on AI education design and competency-based education curriculum as education for business major. The purpose of this study is to design AI education to cultivate competency-oriented AI literacy for business major in universities. Research design, data and methodology: For the design of AI basic education in business major, three expert Delphi surveys were conducted, and a demand analysis and specialization strategy were established, and the reliability of the derived design contents was verified by reflecting the results. Results: As a result, the main competencies for cultivating AI literacy were data literacy, AI understanding and utilization, and the main detailed areas derived from this were data structure understanding and processing, visualization, web scraping, web crawling, public data utilization, and concept of machine learning and application. Conclusions: The educational design content derived through this study is expected to help establish the direction of competency-centered AI education in the future and increase the necessity and value of AI education by utilizing it based on the major field.