• Title/Summary/Keyword: Crawling

Search Result 360, Processing Time 0.033 seconds

The impact of inter-host links in crawling important pages early

  • Alam, Hijbul;Ha, Jong-Woo;Sim, Kyu-Sun;Lee, Sang-Keun
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2010.06c
    • /
    • pp.118-121
    • /
    • 2010
  • The dynamic nature and exponential growth of the World Wide Web remain crawling important pages early still challenging. State-of-the-art crawl scheduling algorithms require huge running time to prioritize web pages during crawling. In this research, we proposed crawl scheduling algorithms that are not only fast but also download important pages early. The algorithms give high importance to some specific pages those have good linkages such as inlinks from different domains or host. The proposed algorithms were experimented on publically available large datasets. The results of experiments showed that propagating more importance to the inter-host links improves the effectiveness of crawl scheduling than the current state-of-the-art crawl scheduling algorithms.

  • PDF

Image Classification Model using web crawling and transfer learning (웹 크롤링과 전이학습을 활용한 이미지 분류 모델)

  • Lee, JuHyeok;Kim, Mi Hui
    • Journal of IKEEE
    • /
    • v.26 no.4
    • /
    • pp.639-646
    • /
    • 2022
  • In this paper, to solve the large dataset problem, we collect images through an image collection method called web crawling and build datasets for use in image classification models through a data preprocessing process. We also propose a lightweight model that can automatically classify images by adding category values by incorporating transfer learning into the image classification model and an image classification model that reduces training time and achieves high accuracy.

Design of a Web-based Barter System using Data Crawling (Crawling을 이용한 웹기반의 물물교환 시스템설계)

  • Yoo, Hongseok;Kim, Ji-Won;Hwang, Jong-Wook;Park, Tae-Won;Lee, Jun-Hee
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2021.07a
    • /
    • pp.527-528
    • /
    • 2021
  • 본 논문에서는 사용자에게 편의성을 제공하며 기존 물물거래 시스템의 단점을 보완한 웹기반의 물물교환 시스템을 제안한다. 대부분 사람들이 중고거래나 필요 없는 물품에 대해 판매를 하는 목적은 자신에게 필요 없는 물건을 처리하고 필요한 물건을 구매하기 위해서이다. 이러한 사용자들의 관점에서 보았을 때, 필요한 물건을 얻기까지의 과정이 장시간 걸린다는 단점이 있으며, 사람들이 필요 없는 물건을 버려 낭비되고 과소비되는 경우도 있다. 이러한 문제를 해결해서 필요 없는 물건을 필요로 하는 사람과 물물교환을 하여 불필요한 소비를 줄이고 필요한 제품을 서로 쉽게 찾고 교환할 수 있도록 사용자에게 편의성을 제공하는 물물교환 시스템을 제안한다.

  • PDF

A Design and Implementation of Disaster Text Crawling and Visualization Application (재난 문자 크롤링 및 시각화 애플리케이션 설계 및 구현)

  • Lee, Won Joo;Park, Bong Kyun;Park, Mun Kyu
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2021.01a
    • /
    • pp.89-90
    • /
    • 2021
  • 본 논문에서는 Python과 Selenium 라이브러리 기반의 재난 문자 크롤링 및 데이터 시각화 애플리케이션을 설계하고 구현한다. 이 애플리케이션의 특징은 웹상에서 재난 문자 데이터를 크롤링(Crawling)하여 그 빈도수에 따라 시각화하는 것이다. 이 애플리케이션을 활용하여 국민재난안전포털에 접속하여 재난 문자 데이터를 크롤링하고, 그 데이터를 Word Cloud를 활용하여 지역별 재난 문자 빈도수를 시각화한다. 지역별 재난 문자 빈도수를 한눈에 보기 쉽게 시각화함으로써 재난문자를 잘 확인하지 않는 사람들에게 해당 지역의 재난 정보를 쉽게 전달하는 기능을 제공한다.

  • PDF

Designing and implementing web crawling-based SNS web site (웹 크롤링 기반 SNS웹사이트 설계 및 구현)

  • Yoon, Kyung Seob;Kim, Yeon Hong
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2018.01a
    • /
    • pp.21-24
    • /
    • 2018
  • 기존 Facebook 페이지의 경우에는 수많은 제보 글이 올라와 사용자가 원하는 글을 찾기 어렵다는 문제점이 발생하고 있다. 본 논문에서는 이를 위해 다양한 Facebook 페이지 내용을 크롤링하여 사용자가 원하는 Facebook 페이지 내용을 검색하여 사용자에게 제공할 수 있도록 데이터베이스 서버에 저장 한 후 크롤링 된 Facebook 페이지 내용을 제공할 수 있는 웹사이트를 설계하고 구현한다.

  • PDF

Clustering Analysis of Films on Box Office Performance : Based on Web Crawling (영화 흥행과 관련된 영화별 특성에 대한 군집분석 : 웹 크롤링 활용)

  • Lee, Jai-Ill;Chun, Young-Ho;Ha, Chunghun
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.39 no.3
    • /
    • pp.90-99
    • /
    • 2016
  • Forecasting of box office performance after a film release is very important, from the viewpoint of increase profitability by reducing the production cost and the marketing cost. Analysis of psychological factors such as word-of-mouth and expert assessment is essential, but hard to perform due to the difficulties of data collection. Information technology such as web crawling and text mining can help to overcome this situation. For effective text mining, categorization of objects is required. In this perspective, the objective of this study is to provide a framework for classifying films according to their characteristics. Data including psychological factors are collected from Web sites using the web crawling. A clustering analysis is conducted to classify films and a series of one-way ANOVA analysis are conducted to statistically verify the differences of characteristics among groups. The result of the cluster analysis based on the review and revenues shows that the films can be categorized into four distinct groups and the differences of characteristics are statistically significant. The first group is high sales of the box office and the number of clicks on reviews is higher than other groups. The characteristic of the second group is similar with the 1st group, while the length of review is longer and the box office sales are not good. The third group's audiences prefer to documentaries and animations and the number of comments and interests are significantly lower than other groups. The last group prefer to criminal, thriller and suspense genre. Correspondence analysis is also conducted to match the groups and intrinsic characteristics of films such as genre, movie rating and nation.

Refresh Cycle Optimization for Web Crawlers (웹크롤러의 수집주기 최적화)

  • Cho, Wan-Sup;Lee, Jeong-Eun;Choi, Chi-Hwan
    • The Journal of the Korea Contents Association
    • /
    • v.13 no.6
    • /
    • pp.30-39
    • /
    • 2013
  • Web crawler should maintain fresh data with minimum server overhead for large amount of data in the web sites. The overhead in the server increases rapidly as the amount of data is exploding as in the big data era. The amount of web information is increasing rapidly with advanced wireless networks and emergence of diverse smart devices. Furthermore, the information is continuously being produced and updated in anywhere and anytime by means of easy web platforms, and smart devices. Now, it is becoming a hot issue how frequently updated web data has to be refreshed in data collection and integration. In this paper, we propose dynamic web-data crawling methods, which include sensitive checking of web site changes, and dynamic retrieving of web pages from target web sites based on historical update patterns. Furthermore, we implemented a Java-based web crawling application and compared efficiency between conventional static approaches and our dynamic one. Our experiment results showed 46.2% overhead benefits with more fresh data compared to the static crawling methods.

Learning Effects of Flipped Learning based on Learning Analytics in SW Coding Education (SW 코딩교육에서의 학습분석기반 플립러닝의 학습효과)

  • Pi, Su-Young
    • Journal of Digital Convergence
    • /
    • v.18 no.11
    • /
    • pp.19-29
    • /
    • 2020
  • The study aims to examine the effectiveness of flipped learning teaching methods by using learning analytics to enable effective programming learning for non-major students. After designing a flipped learning programming class model applied with the ADDIE model, learning-related data of the lecture support system operated by the school was processed with crawling. By providing data processed with crawling through a dashboard so that the instructor can understand it easily, the instructor can design classes more efficiently and provide individually tailored learning based on this. As a result of analysis based on the learning-related data collected through one semester class, it was found that the department, academic year, attendance, assignment submission, and preliminary/review attendance had an effect on academic achievement. As a result of survey analysis, they responded that the individualized feedback of instructors through learning analysis was very helpful in self-directed learning. It is expected that it will serve as an opportunity for instructors to provide a foundation for enhancing teaching activities. In the future, the contents of social network services related to learners' learning will be processed with crawling to analyze learners' learning situations.

Effects of Mechanically Different Environments on the Crawling Waveform of Caenorhabditis Elegans (기계적으로 다른 환경에서 예쁜 꼬마선충의 기는 파형 변화)

  • Kim, Dae-Yeon;Byeon, Soo-Yung;Kim, Se-Ho;Shin, Jennifer Hyun-Jong
    • Transactions of the Korean Society of Mechanical Engineers B
    • /
    • v.36 no.2
    • /
    • pp.125-130
    • /
    • 2012
  • The nematode Caenorhabditis elegans is a widely used model organism in biological research. Thanks to the availability of well-established knowledge about its neural connectivity, a wide range of studies have been attempted to uncover the relationship between behaviors and the responsible neurons. In our research, the adaptive behavior of C. elegans in solid environments with different surface rigidities is investigated, where the worm adapts to different mechanical stiffnesses by modulating its crawling waveform. The amplitude and wavelength of the crawling waveform decrease as the environment becomes more rigid. Interestingly, the mechanosensation-defective mutant shows different responses to the surface rigidity compared to those of the wild-type worm. To explain the adaptation process in mechanically different environments, we suggest a plausible neural circuit model.

Comparison of Muscle Activities Serratus Anterior and Upper Trapezius Muscle During Scapular Protraction in Quadruped Position at Legs Difference (네발기기 자세에서 어깨뼈 내밈 운동시 다리들기에 따른 앞톱니근과 위등세모근의 근활성도 비교)

  • Kim, Hee-gon;Hwang, Byeong-jun;Kim, Jong-woo
    • The Journal of Korean Academy of Orthopedic Manual Physical Therapy
    • /
    • v.25 no.1
    • /
    • pp.29-36
    • /
    • 2019
  • Background: This study was conducted to investigate the effect of leg lift difference on the serratus anterior muscle and the upper trapezius muscle when a subject with winged scapula performs a scapula protraction exercise in a four-leg crawling posture. Method: Twenty normal adults and 20 subjects with winged scapula participated in the experiment. Surface EMG recordings were collected from serratus anterior muscle and back trapezius muscle during scapula protraction exercises. Scapular winging is measured with the lifting distance of scapula retraction to the back using an electronic digital caliper. In two groups of four-leg crawling posture, the two legs support, the dominant leg lifting, and the non-dominant leg lifting, including the scapula protraction, were performed. To examine the difference between groups in the variance analysis, the Bonferroni correction was used (significance level ${\alpha}=.017$). Statistical significance level ${\alpha}$ was .05. Results: There was a significant difference in serratus anterior muscle and upper trapezius muscle during push-up plus exercise in leg lifting in four-leg crawling posture, but there were no significant differences in muscle activity between serratus anterior muscle and upper trapezius muscle, and there was no significant difference according to the presence or absence of scapular winging. Conclusion: For the shoulder stability of the ipsilateral side with the serratus anterior muscle, the leg-lifting posture is effective in the four-leg crawling, and also when a subject with winged scapula chooses an exercise, lifting the ipsilateral side of leg with scapula protraction exercises at the same time may have a positive effect on scapula dysfunction.