• Title/Summary/Keyword: Web Search Engines

Search Result 209, Processing Time 0.021 seconds

The Effectiveness of the Invisible Web Search Tools (Invisible Web 탐색도구의 성능 비교 및 분석)

  • Ro, Jung-Soon
    • Journal of the Korean Society for information Management
    • /
    • v.21 no.3
    • /
    • pp.203-225
    • /
    • 2004
  • This study is to investigate the characteristics of the Invisible Web and many search services designed to serve as gateways to the Invisible Web and to evaluate searching the Invisible Web in the Services. The four services for searching the Invisible Web were selected to search the Invisible Web with 11 queries, that are Google as portals, ProFusion and Search.com as Invisible Web meta search engines, and IncyWincy as Invisible Web search engines. It was found that the effectiveness of Google's Invisible Web searching was better compared with the three Invisible Web search tools but the difference between the four systems was not significant((${\alpha}$=.055) The Invisible Web meta searching was better than the Web meta searching in the three search tools at the statistically significant level. The effectiveness measurement based on the ranks and relevance degree(quality) of relevant documents retrieved seemed appropriate to the ranked search results.

Research on User's Query Processing in Search Engine for Ocean using the Association Rules (연관 규칙 탐사 기법을 이용한 해양 전문 검색 엔진에서의 질의어 처리에 관한 연구)

  • 하창승;윤병수;류길수
    • Journal of the Korea Society of Computer and Information
    • /
    • v.8 no.2
    • /
    • pp.8-15
    • /
    • 2003
  • Recently various of information suppliers provide information via WWW so the necessary of search engine grows larger. However the efficiency of most search engines is low comparatively because of using simple pattern match technique between user's query and web document. A specialized search engine returns the specialized information depend on each user's search goal. It is trend to develop specialized search engines in many countries. However, most such engines don't satisfy the user's needs. This paper proposes the specialized search engine for ocean information that uses user's query related with ocean and the association rules in web data mining can prove relation between web documents. So this search engine improved the recall of data and the precision in existent search method.

  • PDF

RepWeb: A Web-Based Search Tool for Repeat-Related Literatures

  • Woo, Tae-Ha;Kim, Young-Uk;Kwon, Je-Keun;Seo, Jung-Min
    • Genomics & Informatics
    • /
    • v.5 no.2
    • /
    • pp.88-91
    • /
    • 2007
  • Repetitive sequences such as SINE, LINE, and LTR elements form a major part of eukaryotic genomes. A literature search tool that summarizes the information contained within repeat elements would provide biologists in the field of genomics with a useful tool for analyzing genomic sequence features. We developed a java program designed to make literature access easier by using two search engines simultaneously. RepWeb is a web-based search system that provides a user friendly interface for searching the reference data and journals for information related to repeat elements by using the search engines, Google Scholar and PubMed, simultaneously. It provides an interface that displays the repeat element- related biological information, and includes useful functions such as the production of a repeat tree, clickable links to PubMed and Google Scholar, exporting, and sorting a field into date, author, journal and title.

Financial Education for Children Using the Internet: An Analysis on Interactive Financial Education Web Sites (인터넷을 이용한 어린이 금융교육: 쌍방향 금융교육 웹사이트 현황 분석)

  • Choi Nam Sook;Baek Eunyoung
    • Journal of Family Resource Management and Policy Review
    • /
    • v.8 no.1
    • /
    • pp.47-60
    • /
    • 2004
  • Recognizing a tremendous increase in the Internet users and popularity of E-learning through the Internet, this study attempted to analyze interactive financial education web sites for children. Using meta search engines and major search engines, interactive financial education web sites identified based on the three criteria and analyzed in terms of the appropriateness for specific age groups, the coverage of contents related to the basic knowledge for financial literacy, and the interactive activities. The results showed that financial education web sites for children were needed to be improved in terms of both quantity and quality. The study also provides a guideline how to search for an appropriate financial education web sites for children when parents want teach about money to their children.

  • PDF

User Perceptions of Uncertainty in the Evaluation of Search Results

  • Kim, Yang-Woo
    • International Journal of Contents
    • /
    • v.8 no.1
    • /
    • pp.100-107
    • /
    • 2012
  • While considerable research suggests that users' uncertainty gradually decreases, as they proceed through the information seeking process, others argue that it can arise at any stage of their information seeking process. Reflecting the latter view, this study examined user perceptions of uncertainty in the final stage of users' information seeking process, the stage of search results evaluation. Considering the significance of Web search engines for academic study, this study investigated the relevance decision stage of scholarly researchers in the field of science, who use Web search engines for their academic study. Based on the analysis of the users' uncertainty, this study provided implications to improve information systems and Web contents design.

Design of Advanced HITS Algorithm by Suitability for Importance-Evaluation of Web-Documents (웹 문서 중요도 평가를 위한 적합도 향상 HITS 알고리즘 설계)

  • 김분희;한상용;김영찬
    • The Journal of Society for e-Business Studies
    • /
    • v.8 no.2
    • /
    • pp.23-31
    • /
    • 2003
  • Link-based search engines generate the rank using linked information of related web-documents . HITS(Hypertext Internet Topic Search), representative ranking evaluation algorithm using a special feature of web-documents based on such link, evaluates the importance degree of related pages from linked information and presents by ranking information. Problem of such HITS algorithm only is considered the link frequency within documents and depends on the set of web documents as input value. In this paper, we design the search agent based on better HITS algorithm according to advanced suitability between query and search-result in the set of given documents from link-based web search engine. It then complements locality of advanced search performance and result.

  • PDF

A Document Collection Method for More Accurate Search Engine (정확도 높은 검색 엔진을 위한 문서 수집 방법)

  • Ha, Eun-Yong;Gwon, Hui-Yong;Hwang, Ho-Yeong
    • The KIPS Transactions:PartA
    • /
    • v.10A no.5
    • /
    • pp.469-478
    • /
    • 2003
  • Internet information search engines using web robots visit servers conneted to the Internet periodically or non-periodically. They extract and classify data collected according to their own method and construct their database, which are the basis of web information search engines. There procedure are repeated very frequently on the Web. Many search engine sites operate this processing strategically to become popular interneet portal sites which provede users ways how to information on the web. Web search engine contacts to thousands of thousands web servers and maintains its existed databases and navigates to get data about newly connected web servers. But these jobs are decided and conducted by search engines. They run web robots to collect data from web servers without knowledge on the states of web servers. Each search engine issues lots of requests and receives responses from web servers. This is one cause to increase internet traffic on the web. If each web server notify web robots about summary on its public documents and then each web robot runs collecting operations using this summary to the corresponding documents on the web servers, the unnecessary internet traffic is eliminated and also the accuracy of data on search engines will become higher. And the processing overhead concerned with web related jobs on web servers and search engines will become lower. In this paper, a monitoring system on the web server is designed and implemented, which monitors states of documents on the web server and summarizes changes of modified documents and sends the summary information to web robots which want to get documents from the web server. And an efficient web robot on the web search engine is also designed and implemented, which uses the notified summary and gets corresponding documents from the web servers and extracts index and updates its databases.

Design and Implementation of Web Crawler utilizing Unstructured data

  • Tanvir, Ahmed Md.;Chung, Mokdong
    • Journal of Korea Multimedia Society
    • /
    • v.22 no.3
    • /
    • pp.374-385
    • /
    • 2019
  • A Web Crawler is a program, which is commonly used by search engines to find the new brainchild on the internet. The use of crawlers has made the web easier for users. In this paper, we have used unstructured data by structuralization to collect data from the web pages. Our system is able to choose the word near our keyword in more than one document using unstructured way. Neighbor data were collected on the keyword through word2vec. The system goal is filtered at the data acquisition level and for a large taxonomy. The main problem in text taxonomy is how to improve the classification accuracy. In order to improve the accuracy, we propose a new weighting method of TF-IDF. In this paper, we modified TF-algorithm to calculate the accuracy of unstructured data. Finally, our system proposes a competent web pages search crawling algorithm, which is derived from TF-IDF and RL Web search algorithm to enhance the searching efficiency of the relevant information. In this paper, an attempt has been made to research and examine the work nature of crawlers and crawling algorithms in search engines for efficient information retrieval.

A Study on the Organizing Directory for Internet Directory Search Engines (인터넷 검색엔진의 디렉토리 구성에 관한 연구)

  • 신동민
    • Journal of the Korean Society for information Management
    • /
    • v.18 no.2
    • /
    • pp.143-164
    • /
    • 2001
  • The purpose of this study is to suggest the guidelines for organizing and maintaining their subject directory search engines which serve effective search results of web documents. The methods of this study are to review and analyze some directory search engines for finding problems of them and review literatures concerned with classification theory and previous studies. As results, this study suggests the guidelines for preparing the systematic subject directory scheme that is adapted to the internet directory search engines related to general andlor special subject fields through using the above guidelines.

  • PDF

Implementation of a Parallel Web Crawler for the Odysseus Large-Scale Search Engine (오디세우스 대용량 검색 엔진을 위한 병렬 웹 크롤러의 구현)

  • Shin, Eun-Jeong;Kim, Yi-Reun;Heo, Jun-Seok;Whang, Kyu-Young
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.14 no.6
    • /
    • pp.567-581
    • /
    • 2008
  • As the size of the web is growing explosively, search engines are becoming increasingly important as the primary means to retrieve information from the Internet. A search engine periodically downloads web pages and stores them in the database to provide readers with up-to-date search results. The web crawler is a program that downloads and stores web pages for this purpose. A large-scale search engines uses a parallel web crawler to retrieve the collection of web pages maximizing the download rate. However, the service architecture or experimental analysis of parallel web crawlers has not been fully discussed in the literature. In this paper, we propose an architecture of the parallel web crawler and discuss implementation issues in detail. The proposed parallel web crawler is based on the coordinator/agent model using multiple machines to download web pages in parallel. The coordinator/agent model consists of multiple agent machines to collect web pages and a single coordinator machine to manage them. The parallel web crawler consists of three components: a crawling module for collecting web pages, a converting module for transforming the web pages into a database-friendly format, a ranking module for rating web pages based on their relative importance. We explain each component of the parallel web crawler and implementation methods in detail. Finally, we conduct extensive experiments to analyze the effectiveness of the parallel web crawler. The experimental results clarify the merit of our architecture in that the proposed parallel web crawler is scalable to the number of web pages to crawl and the number of machines used.