• Title/Summary/Keyword: Nutch

Search Result 5, Processing Time 0.02 seconds

Search for a user-centered system design and implementation (사용자 중심 검색 시스템 설계 및 구현)

  • Kim, A-Yong;Park, Man-Seub;Kim, Jong-Moon;Jeong, Dae-Jin;Jung, Hoe-kyung
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2014.05a
    • /
    • pp.619-621
    • /
    • 2014
  • addition to the advances in information technology and the latest IT technology for their issue. To enable users who are using the Web to find need the information your search data they're sifting through about how many are struggling. In this paper, we propose a user-centered search system. Lucene search system to offer Hadoop's MapReduce with the Apache project Nutch, Solr, HDFS, utilizing design and implementation. This is the Web search users who wish to use depending on the intentions of the data that you want to collect and index information will be utilized in the search field.

  • PDF

Implement on Search Machine using Open Source Framework (오픈 소스 프레임워크를 활용한 검색엔진 구현)

  • Song, Hyun-Ok;Kim, A-Yong;Jung, Hoe-Kyung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.19 no.3
    • /
    • pp.552-557
    • /
    • 2015
  • IT technology development and smart appliances due to the increased use of a lot of data on production and consumption has become in the internet. Because this is why importance of information retrieval technology although the growing becoming aware of the difficult techniques to access the required of lot a background knowledge on information retrieval technology. However, the Lucene due to emerge provide to background can implement on search engine by using the Lucene of lack background knowledge for search technology. In this paper, suggest to implement on search engine by using the developed a framework on Lucene-based. Suggest a frameworks are use in the search engines on have guarantee in server environment support on distributed processing and distributed storage, and high availability by using the Hadoop and Nutch, Solr, Zookeeper.

Comparative Analysis of Web Archiving Tools (웹아카이빙 도구 비교분석 연구)

  • Kim, Heejung
    • Proceedings of the Korean Society for Information Management Conference
    • /
    • 2011.08a
    • /
    • pp.95-98
    • /
    • 2011
  • 디지털 자원의 장기보존을 위한 기법과 전략은 지속적인 관심 속에서 개발되어 오고 있다. 특히, 웹 자원에 대한 의존도가 증폭될수록 웹 아카이빙에 대한 중요성이 커지고 있다. 본 연구에서는 IIPC에서 제시하는 웹 아카이빙 체인의 네 단계에 해당하는 각 단계별 웹 아카이빙 툴과 그 특성을 살펴보았다. 대상이 되는 웹 아카이빙 도구는 총 9개로서, Heritrix, DeepArc, Web Curator Tool, NetarchiveSuite, BnFArcTools, Wayback, NutchWAX, WERA 그리고 Xinq 등이다.

  • PDF

Design of Search System Based on Lucene for Minimum Price Products (루씬 기반의 최저가 상품 검색 시스템 설계)

  • Kim, A-Yong;Jeong, Dae-Jin;Gye, Min-Suk;Kim, Chang-Su;Jung, Hoe-kyung
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2014.10a
    • /
    • pp.603-605
    • /
    • 2014
  • Has been switched to the online shopping market in stores of the consumer is from increased utilization and smart devices, the internet popularization. That is why has been converting the user's consumption patterns and consumer culture. Open markets is provides of making a wide variety of events and lowest price policies, safe transactions etc, for attract the consumers of expand distribution channels of the web and via mobile. In this paper, a designs of provides a search system for minimum price product information to the user of Information collect and analyze on sale from open market.

  • PDF

Design and Implementation of a Search Engine based on Apache Spark (아파치 스파크 기반 검색엔진의 설계 및 구현)

  • Park, Ki-Sung;Choi, Jae-Hyun;Kim, Jong-Bae;Park, Jae-Won
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.21 no.1
    • /
    • pp.17-28
    • /
    • 2017
  • Recently, a study on data has been actively conducted because the value of the data has become more useful. Web crawler that is program of data collection recently spotlighted because it can take advantage of the various fields. Web crawler can be defined as a tool to analyze the web pages and collects the URL by traversing the web server in an automated manner. For the treatment of Big-data, distributed Web crawler is widely used which is based on the Hadoop MapReduce. But, it is difficult to use and has constraints on the performance. Apache spark that is the In-memory computing platform is an alternative to MapReduce. The search engine which is one of the main purposes of web crawler displays the information you search by keyword gathered by web crawler. If search engines implement a spark-based web crawler instead of traditional MapReduce-based web crawler, it would be a more rapid data collection.