Multi-threaded Web Crawling Design using Queues (큐를 이용한 다중스레드 방식의 웹 크롤링 설계)

  • Kim, Hyo-Jong;Lee, Jun-Yun;Shin, Seung-Soo
    • Journal of Convergence for Information Technology
    • v.7 no.2
    • pp.43-51
    • 2017
  • Background/Objectives : The purpose of this study is to propose a multi-threaded web crawl using queues that can solve the problem of time delay of single processing method, cost increase of parallel processing method, and waste of manpower by utilizing multiple bots connected by wide area network Design and implement. Methods/Statistical analysis : This study designs and analyzes applications that run on independent systems based on multi-threaded system configuration using queues. Findings : We propose a multi-threaded web crawler design using queues. In addition, the throughput of web documents can be analyzed by dividing by client and thread according to the formula, and the efficiency and the number of optimal clients can be confirmed by checking efficiency of each thread. The proposed system is based on distributed processing. Clients in each independent environment provide fast and reliable web documents using queues and threads. Application/Improvements : There is a need for a system that quickly and efficiently navigates and collects various web sites by applying queues and multiple threads to a general purpose web crawler, rather than a web crawler design that targets a particular site.

Ergonomic infomation retrieval through internet and its applications (인터넷을 이용한 인간공학정보의 검색 및 응용)

  • 이남식
    • Proceedings of the ESK Conference
    • 1995.10a
    • pp.185-191
    • 1995
  • This paper reviews how to access the ergonomic information through internet-the world-wide computer networks. Recently, with the growth of hypertext type Internet-the WWW (World-Wide Web), it becomes much easier to access to the Internet and we can retrieve information very effeciently. In order to search the ergonomic information, this paper also reviews the famous Web search engines like Lycos, Web Crawler, and meta-indices like YAHOO. Also, useful Web-sites of ergonomics/human factors such as ErgoWeb are summarised.

A Design of the XML-based Trans-Gate System for Mobile Device (XML 기반의 이동단말기를 위한 Trans-Gate System 설계)

  • 남궁명희;양혁;황재각;임영환
    • Proceedings of the Korean Information Science Society Conference
    • 2003.04c
    • pp.591-593
    • 2003
  • 현재 모바일 서비스는 이동단말기와 더불어 발전하고 있다. 그러나 이동단말기 마다 다른 플랫폼을 가진 환경에서 유선인터넷 컨텐츠를 가지고 모바일 서비스하기 위해서는 마크업언어인 XML 기술을 이용한 변환이 필요하며 이것을 Trans-Gate System이라 한다. Trans-Gate System은 유선인터넷 컨텐츠를 모바일 디바이스 플랫폼(WML HDML, m-HTML)메 맞게 변환하는 시스템을 설계한다. 이 시스템은 X-Crawler와 Call Manager의 2가지 모듈로 나눠서 기존의 유선 인터넷에 있는 멀티미디어 컨텐츠를 사용자 Device에 맞게 변환하는 시스템이다. 따라서 이 시스템은 기존에 따로 모바일 서비스만을 위한 컨텐츠를 만들지 않아도 되는 장점이 있다.

Basic properties survey report on the rock classification (암반 등급분류를 위한 기초 물성조사 보고서)

  • Huh, Ginn
    • Journal of the Korean Professional Engineers Association
    • v.24 no.3
    • pp.43-50
    • 1991
  • On the ground foundation works for Bldg site, Rock classification test can be obtained as follows due to the International Society for Rock Mechanics. 1. In-take test ; Compression strength, Point load test. 2. In-situ test : Schmidt hammer test. Burden test finaly the convinient co-relation table between strength and S.H. test were carried out for site-engineer. This project is one of contineous works regarding to Burden test from Jack leg drill( ø 36mm) to Crawler drill( ø 75mm) use.

Robot Techologies in Response to Accidents in Nuclear Power Plants

  • Kim, Seungho;Jung, Kyung-Min;Kim, Chang-Hoi;Seo, Yong-Chil
    • 제어로봇시스템학회:학술대회논문집
    • 2002.10a
    • pp.43.6-43
    • 2002
  • $\textbullet$ KAEROT/m1 with an omni-directional planetary wheel mechanism for the narrow corridor. $\textbullet$ KAEROT/m2 can pass over the ditch with specially designed four wheel of a re-configurable crawler. $\textbullet$ Stereo imaging system with master manipulator enhancing the tele-presence. $\textbullet$ Small hybrid dosimeter detecting radiation dose and dose rate simultaneously.

Careful Blasting to Reduce the Level of Ground Vibration in Open Excavation (노천 굴착에서 발파 진동의 크기를 감소시키기 위한 정밀발파)

  • Huh, Ginn
    • Geotechnical Engineering
    • v.6 no.3
    • pp.5-12
    • 1990
  • In this paper, ground vibration and other properties measurements were conducted to deter mine empirical equation based on careful test blasting with crawler drill(diameter 70-75mm). The empirical euqations for ground vibration are obtained as follows where V is peak particle velocity in cm 1 sec, D is distance in m and W is maximum charge weight per delay in kg

Refresh Cycle Optimization for Web Crawlers (웹크롤러의 수집주기 최적화)

  • Cho, Wan-Sup;Lee, Jeong-Eun;Choi, Chi-Hwan
    • The Journal of the Korea Contents Association
    • v.13 no.6
    • pp.30-39
    • 2013
  • Web crawler should maintain fresh data with minimum server overhead for large amount of data in the web sites. The overhead in the server increases rapidly as the amount of data is exploding as in the big data era. The amount of web information is increasing rapidly with advanced wireless networks and emergence of diverse smart devices. Furthermore, the information is continuously being produced and updated in anywhere and anytime by means of easy web platforms, and smart devices. Now, it is becoming a hot issue how frequently updated web data has to be refreshed in data collection and integration. In this paper, we propose dynamic web-data crawling methods, which include sensitive checking of web site changes, and dynamic retrieving of web pages from target web sites based on historical update patterns. Furthermore, we implemented a Java-based web crawling application and compared efficiency between conventional static approaches and our dynamic one. Our experiment results showed 46.2% overhead benefits with more fresh data compared to the static crawling methods.

Abusive Detection Using Bidirectional Long Short-Term Memory Networks (양방향 장단기 메모리 신경망을 이용한 욕설 검출)

  • Na, In-Seop;Lee, Sin-Woo;Lee, Jae-Hak;Koh, Jin-Gwang
    • The Journal of Bigdata
    • v.4 no.2
    • pp.35-45
    • 2019
  • Recently, the damage with social cost of malicious comments is increasing. In addition to the news of talent committing suicide through the effects of malicious comments. The damage to malicious comments including abusive language and slang is increasing and spreading in various type and forms throughout society. In this paper, we propose a technique for detecting abusive language using a bi-directional long short-term memory neural network model. We collected comments on the web through the web crawler and processed the stopwords on unused words such as English Alphabet or special characters. For the stopwords processed comments, the bidirectional long short-term memory neural network model considering the front word and back word of sentences was used to determine and detect abusive language. In order to use the bi-directional long short-term memory neural network, the detected comments were subjected to morphological analysis and vectorization, and each word was labeled with abusive language. Experimental results showed a performance of 88.79% for a total of 9,288 comments screened and collected.

Development of Real-Time Thickness Measuring System for Insulated Pipeline Using Gamma-ray (감마선을 이용한 단열배관의 실시간 두께측정시스템 개발)

  • Jang, Ji-Hoon;Kim, Byung-Joo;Kim, Gi-Dong;Cho, Kyung-Shik
    • Journal of the Korean Society for Nondestructive Testing
    • v.22 no.5
    • pp.500-507
    • 2002
  • By this study, on-line real-time radiometric system was developed using a 64 channels linear array of solid state detectors to measure wall thickness of insulated piping system. This system uses an Ir-192 as a gamma ray source and detector is composed of BGO scintillator and photodiode. Ir-192 gamma ray source and linear detector array mounted on a computer controlled robotic crawler. The Ir-192 gamma ray source is located on one side of the piping components and the detector array on the other side. The individual detectors of the detector array measure the intensity of the gamma rays after passing through the walls and the insulation of the piping component under measurement. The output of the detector array is amplified by amplifier and transmitted to the computer through cable. This system collects and analyses the data from the detector array in real-time as the crawler travels over the piping system. The maximum measurable length of pipe is 120cm/min. in the case of 1mm scanning interval.