• Title/Summary/Keyword: retrieval engine

Search Result 148, Processing Time 0.024 seconds

A Study on the Crawling and Classification Strategy for Local Website (로컬 웹사이트의 탐색전략과 웹사이트 유형분석에 관한 연구)

  • Hwang In-Soo
    • Journal of Information Technology Applications and Management
    • /
    • v.13 no.2
    • /
    • pp.55-65
    • /
    • 2006
  • Since the World-Wide Web (WWW) has become a major channel for information delivery, information overload also has become a serious problem to the Internet users. Therefore, effective information searching is critical to the success of Internet services. We present an integrated search engine for searching relevant web pages on the WWW in a certain Internet domain. It supports a local search on the web sites. The spider obtains all of the web pages from the web sites through web links. It operates autonomously without any human supervision. We developed state transition diagram to control navigation and analyze link structure of each web site. We have implemented an integrated local search engine and it shows that a higher satisfaction is obtained. From the user evaluation, we also find that higher precision is obtained.

  • PDF

An analysis of user behaviors on the search engine results pages based on the demographic characteristics

  • Bitirim, Yiltan;Ertugrul, Duygu Celik
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.7
    • /
    • pp.2840-2861
    • /
    • 2020
  • The purpose of this survey-based study is to make an analysis of search engine users' behaviors on the Search Engine Results Pages (SERPs) based on the three demographic characteristics gender, age, and program studying. In this study, a questionnaire was designed with 12 closed-ended questions. Remaining questions other than the demographic characteristic related ones were about "tab", "advertisement", "spelling suggestion", "related query suggestion", "instant search suggestion", "video result", "image result", "pagination" and the amount of clicking results. The questionnaire was used and the data collected were analyzed with the descriptive statistics as well as the inferential statistics. 84.2% of the study population was reached. Some of the major results are as follows: Most of each demographic characteristic category (i.e. female, male, under-20, 20-24, above-24, English computer engineering, Turkish computer engineering, software engineering) have rarely or more click for tab, spelling suggestion, related query suggestion, instant search suggestion, video result, image result, and pagination. More than 50.0% of female category click advertisement rarely; however, for the others, 50.0% or more never click advertisement. For every demographic characteristic category, between 78.0% and 85.4% click 10 or fewer results. This study would be the first attempt with its complete content and design. Search engine providers and researchers would gain knowledge to user behaviors about the usage of the SERPs based on the demographic characteristics.

Users' Understanding of Search Engine Advertisements

  • Lewandowski, Dirk
    • Journal of Information Science Theory and Practice
    • /
    • v.5 no.4
    • /
    • pp.6-25
    • /
    • 2017
  • In this paper, a large-scale study on users' understanding of search-based advertising is presented. It is based on (1) a survey, (2) a task-based user study, and (3) an online experiment. Data were collected from 1,000 users representative of the German online population. Findings show that users generally lack an understanding of Google's business model and the workings of search-based advertising. 42% of users self-report that they either do not know that it is possible to pay Google for preferred listings for one's company on the SERPs or do not know how to distinguish between organic results and ads. In the task-based user study, we found that only 1.3 percent of participants were able to mark all areas correctly. 9.6 percent had all their identifications correct but did not mark all results they were required to mark. For none of the screenshots given were more than 35% of users able to mark all areas correctly. In the experiment, we found that users who are not able to distinguish between the two results types choose ads around twice as often as users who can recognize the ads. The implications are that models of search engine advertising and of information seeking need to be amended, and that there is a severe need for regulating search-based advertising.

A Search-Result Clustering Method based on Word Clustering for Effective Browsing of the Paper Retrieval Results (논문 검색 결과의 효과적인 브라우징을 위한 단어 군집화 기반의 결과 내 군집화 기법)

  • Bae, Kyoung-Man;Hwang, Jae-Won;Ko, Young-Joong;Kim, Jong-Hoon
    • Journal of KIISE:Software and Applications
    • /
    • v.37 no.3
    • /
    • pp.214-221
    • /
    • 2010
  • The search-results clustering problem is defined as the automatic and on-line grouping of similar documents in search results returned from a search engine. In this paper, we propose a new search-results clustering algorithm specialized for a paper search service. Our system consists of two algorithmic phases: Category Hierarchy Generation System (CHGS) and Paper Clustering System (PCS). In CHGS, we first build up the category hierarchy, called the Field Thesaurus, for each research field using an existing research category hierarchy (KOSEF's research category hierarchy) and the keyword expansion of the field thesaurus by a word clustering method using the K-means algorithm. Then, in PCS, the proposed algorithm determines the category of each paper using top-down and bottom-up methods. The proposed system can be used in the application areas for retrieval services in a specialized field such as a paper search service.

Retrieval algorithm for Web Document using XML DOM (XML DOM을 이용한 웹문서 검색 알고리즘)

  • 김노환;정충교
    • Journal of the Korea Computer Industry Society
    • /
    • v.2 no.6
    • /
    • pp.775-782
    • /
    • 2001
  • Until recently Web retrieval engine has presented a demanded document to users according to the amount and the frequency of inquired key words in each document under the assumption that the more key words a document has, the more accessible it is. This method of searching doesn't matter to a normal document such as HTML Web data in which structural information is not involved. However, Web data realized in XML contains structural information and modeling of graphic forms is also available. Therefore, in the case of XML, this method leads to no less trouble since it depends only on the frequency of key words. We consider that this problem can be resolved by way of inquiry which is similar to SQL. This form of inquiry enables us to snatch an exact data we want in a quick and clear way with a full advantage of structural quality of XML, overcoming the shortcomings of frequency-based engine. In this paper, We aim to design a model of information retrieval system of XML data using XML DOM and consider its algorithm related with it.

  • PDF

User Satisfaction related Perception of the Web Portal for Scholarly Information: Focused on the Academic Version of NAVER Search Engine (학술정보포털에 대한 이용자만족 관련 인식에 관한 연구 - NAVER 전문정보의 학술자료 검색 기능을 중심으로 -)

  • Kim, Yang-Woo
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.51 no.2
    • /
    • pp.255-279
    • /
    • 2017
  • In a qualitative approach, this study investigated users' perceptions associated with their satisfactions in the process of using the scholarly resource search functions of the academic version of the NAVER search engine. For this study, the data was collected from a group of undergraduate students, who conducted academic information searches in the field of own major disciplinary areas, using the Web portal. Based on the data, students' satisfactions and dissatisfactions along with the reasons of their perceptions were analyzed. The results presented users' perceptions in various evaluation criteria based on the three major domains: system interfaces, retrieval mechanisms and search results. Based on the results, the study proposed the following suggestions: 1) the enhancements of the system interfaces and HELP guidances based the limited user knowledge on basic system terminologies 2) the improvements of the retrieval mechanisms associated with understanding the contexts of the search terms presented by users 3) the necessity of the user education due to the insufficient user knowledge of the retrieval mechanisms and the search functions.

Ontology based Retrieval System for Cultural Assets Using Hybrid Text-Sketch Queries (혼합형 질의 방법에 의한 온톨로지 기반 유물 검색 시스템)

  • Cheon Hyeon-Jae;Baek Seung-Jae;Lee Hong-Chul
    • Journal of the Korea Society of Computer and Information
    • /
    • v.10 no.5 s.37
    • /
    • pp.17-26
    • /
    • 2005
  • With the rapidly Slowing information, the research on the effcient information retrieval is increasing. Most of the retrieval systems for domestic cultural assets on the web have adopted a keyword-based search method. Those systems have required users to know the exact information about cultural assets such as name, keyword, etc. However, it is not easy to search the cultural assets with little information or only a remembrance of the shape. In this paper, we propose the retrieval system for cultural assets using both ontology-based and sketch-based search method to solve the Problems of existing systems. Our retrieval system allows users to use both text and sketch for a Query regardless of the type of information about cultural assets and to search in results using the ontology.

  • PDF

A Study on the Implementation of Information Extraction Agency for Ship Sale and Purchase using Content Based Retrieval (내용기반 검색을 이용한 선박매매 정보추출 에이전트의 구현에 관한 연구)

  • Ha, Chang-Seung;Jung, Lee-Sang
    • Journal of the Korea Society of Computer and Information
    • /
    • v.12 no.1 s.45
    • /
    • pp.43-50
    • /
    • 2007
  • Delay in the process of Information Extraction, IE, is largely due to inability to correctly recognize the user's information requirement of particular search factors. Especially if the wrapper rules are used in a search engine, the search generally fails to classify internet documents properly and efficiently since the application of the same wrapper rules lacks extensibility throughout various types of existing internet document. In case of buying or selling a ship, if the price range, type. place of delivery, inspection site and other information relevant to the sales would be available through the internet for proper retrieval the sales could more readily succeed by using Ontology relating to sales or purchase information and by selectively searching for the desired information through the content based retrieval system. This system proposes to improve various wrapper systems existing throughout different internet sites and to eliminate unnecessary information tagged on the existing internet documents in order to create a more advanced information retrieval system.

  • PDF

An Efficient Inverted Index Technique based on RDBMS for XML Documents (XML 문서에 대한 RDBMS에 기반을 둔 효율적인 역색인 기법)

  • 서치영;이상원;김형주
    • Journal of KIISE:Databases
    • /
    • v.30 no.1
    • /
    • pp.27-40
    • /
    • 2003
  • The inverted index widely used in the existing information retrieval field should be extended for XML documents to support containment queries by XML information retrieval systems. In this paper, we consider that there are two methods in storing the inverted index and processing containment queries for XML documents as the previous work suggested: using a RDBMS or using an inverted lift engine. It has two drawbacks to extend the inverted index in the previous work. One is that using a RDBMS is moth worse in the performance than using an inverted list engine. The other is that when containment queries are processed in a RDBMS, there is an increase in the number of a join operation as the path length of a query increases and a join operation always happens between large fables. In this paper. we extend the inverted index in a different way to solve these problems and show the effectiveness of using a RDBMS.

Implement on Search Machine using Open Source Framework (오픈 소스 프레임워크를 활용한 검색엔진 구현)

  • Song, Hyun-Ok;Kim, A-Yong;Jung, Hoe-Kyung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.19 no.3
    • /
    • pp.552-557
    • /
    • 2015
  • IT technology development and smart appliances due to the increased use of a lot of data on production and consumption has become in the internet. Because this is why importance of information retrieval technology although the growing becoming aware of the difficult techniques to access the required of lot a background knowledge on information retrieval technology. However, the Lucene due to emerge provide to background can implement on search engine by using the Lucene of lack background knowledge for search technology. In this paper, suggest to implement on search engine by using the developed a framework on Lucene-based. Suggest a frameworks are use in the search engines on have guarantee in server environment support on distributed processing and distributed storage, and high availability by using the Hadoop and Nutch, Solr, Zookeeper.