• Title/Summary/Keyword: Web documents

Search Result 827, Processing Time 0.033 seconds

RDF and OWL Storage and Query Processing based on Relational Database (관계형 데이타베이스 기반의 RDF와 OWL의 저장 및 질의처리)

  • Jeong Hoyoung;Kim Jungmin;Jung Junwon;Kim Jongnam;Im Donghyuk;Kim Hyoung-Joo
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.11 no.5
    • /
    • pp.451-457
    • /
    • 2005
  • In spite of the development of computers, the present state that a lot of electronic documents are overflowing makes it more difficult for us to get appropriate information. Therefore, it's more important to focus on getting meaningful information than processing the data quickly In this context, Semantic Web enables an intelligent processing by adding semantic metadata on yow web documents. Also, as the Semantic Web grows, the knowledge resources as well as web resources are getting more and more importance. In this paper, we propose an OWL storage system aiming at an intelligent Processing by adding semantic metadata on your web documents, plus a system aiming at an OWL-QL Query Processing.

A Web-document Recommending System using the Korean Thesaurus (한국어 시소러스를 이용한 웹 문서 추천 에이전트)

  • Seo, Min-Rye;Lee, Song-Wook;Seo, Jung-Yun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.13 no.1
    • /
    • pp.103-109
    • /
    • 2009
  • We build the web document recommending agent system which offers a certain amount of web documents to each user by monitoring and learning the user's action of web browsing. We also propose a method of query expansion using the Korean thesaurus. The queries to search for new web documents generate a candidate set using the Korean thesaurus. We extract the words which are mostly correlated with the queries, among the words in the candidate set, by using TF-IDF and mutual information. Then, we expand the query. If we adopt the system of query expansion, we can recommend a lot of web documents which have potential interests to users. We thus conclude that the system of query expansion is more effective than a base system of recommending web-documents to users.

An Adaptive Cache Replacement Policy for Web Proxy Servers (웹 프락시 서버를 위한 적응형 캐시 교체 정책)

  • Choi, Seung-Lak;Kim, Mi-Young;Park, Chang-Sup;Cho, Dae-Hyun;Lee, Yoon-Joon
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.29 no.6
    • /
    • pp.346-353
    • /
    • 2002
  • The explosive increase of World Wide Web usage has incurred significant amount of network traffic and server load. To overcome these problems, web proxy caching replicates frequently requested documents in the web proxy closer to the users. Cache utilization depends on the replacement policy which tries to store frequently requested documents in near future. Temporal locality and Zipf frequency distribution, which are commonly observed in web proxy workloads, are considered as the important properties to predict the popularity of documents. In this paper, we propose a novel cache replacement policy, called Adaptive LFU (ALFU), which incorporates 1) Zipf frequency distribution by utilizing LFU and 2) temporal locality adaptively by measuring the amount of the popularity reduction of documents as time passed efficiently. We evaluate the performance of ALFU by comparing it to other policies via trace-driven simulation. Experimental results show that ALFU outperforms other policies.

Modified Spreading Activation Network for Intelligent Profile Construction in Research Agent System (리서치 에이전트시스템에서의 지능적 프로파일 구축을 위한 개선된 확산 활성화 네트워크)

  • 조영임;김유신
    • Journal of Korea Multimedia Society
    • /
    • v.6 no.6
    • /
    • pp.1111-1119
    • /
    • 2003
  • The research of science and engineering needs the latest information from internet resources. But it is a complex and repeated procedure to search and filter web documents from the huge Internet resources. In this paper, we propose the PREA system, which can organize the research paper databases and search World Wide Web documents that the user is interested in. It observes the usage of the local Paper databases and presented web documents and then constructs a profile intelligently. However, to make a profile, we used the modified spreading activation network(MSAN) so that the PREA can search and filter web documents by semantic meaning of user's interest in realtime. The system constructed in multi-agents manner that can cooperate together effectively. The results show the effectiveness of our system to search web documents compared with a commercial search engine.

  • PDF

A Research for Web Documents Genre Classification using STW (STW를 이용한 웹 문서 장르 분류에 관한 연구)

  • Ko, Byeong-Kyu;Oh, Kun-Seok;Kim, Pan-Koo
    • Journal of Information Technology and Architecture
    • /
    • v.9 no.4
    • /
    • pp.413-422
    • /
    • 2012
  • Many researchers have been studied to reveal human natural language to let machine understand its meaning by text based, page rank based or more. Particularly, it has been considered that URL and HTML Tag information in web documents are attracting people' attention again to analyze huge amount of web document automatically. In this paper, we propose a STW (Semantic Term Weight) approach based on syntactic and linguistic structure of web documents in order to classify what genres are. For the evaluation, we analyzed more than 1,000 documents from 20-Genre-collection corpus for training the documents based on SVM algorithm. Afterwards, we tested KI-04 corpus to evaluate performance of our proposed method. This paper measured their accuracy by classifying them into an experiment using STW and one without u sing STW. As the results, the proposed STW based approach showed approximately 10.2% which Is higher than one without use of STW.

Ontology-based User Customized Search Service Considering User Intention (온톨로지 기반의 사용자 의도를 고려한 맞춤형 검색 서비스)

  • Kim, Sukyoung;Kim, Gunwoo
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.4
    • /
    • pp.129-143
    • /
    • 2012
  • Recently, the rapid progress of a number of standardized web technologies and the proliferation of web users in the world bring an explosive increase of producing and consuming information documents on the web. In addition, most companies have produced, shared, and managed a huge number of information documents that are needed to perform their businesses. They also have discretionally raked, stored and managed a number of web documents published on the web for their business. Along with this increase of information documents that should be managed in the companies, the need of a solution to locate information documents more accurately among a huge number of information sources have increased. In order to satisfy the need of accurate search, the market size of search engine solution market is becoming increasingly expended. The most important functionality among much functionality provided by search engine is to locate accurate information documents from a huge information sources. The major metric to evaluate the accuracy of search engine is relevance that consists of two measures, precision and recall. Precision is thought of as a measure of exactness, that is, what percentage of information considered as true answer are actually such, whereas recall is a measure of completeness, that is, what percentage of true answer are retrieved as such. These two measures can be used differently according to the applied domain. If we need to exhaustively search information such as patent documents and research papers, it is better to increase the recall. On the other hand, when the amount of information is small scale, it is better to increase precision. Most of existing web search engines typically uses a keyword search method that returns web documents including keywords which correspond to search words entered by a user. This method has a virtue of locating all web documents quickly, even though many search words are inputted. However, this method has a fundamental imitation of not considering search intention of a user, thereby retrieving irrelevant results as well as relevant ones. Thus, it takes additional time and effort to set relevant ones out from all results returned by a search engine. That is, keyword search method can increase recall, while it is difficult to locate web documents which a user actually want to find because it does not provide a means of understanding the intention of a user and reflecting it to a progress of searching information. Thus, this research suggests a new method of combining ontology-based search solution with core search functionalities provided by existing search engine solutions. The method enables a search engine to provide optimal search results by inferenceing the search intention of a user. To that end, we build an ontology which contains concepts and relationships among them in a specific domain. The ontology is used to inference synonyms of a set of search keywords inputted by a user, thereby making the search intention of the user reflected into the progress of searching information more actively compared to existing search engines. Based on the proposed method we implement a prototype search system and test the system in the patent domain where we experiment on searching relevant documents associated with a patent. The experiment shows that our system increases the both recall and precision in accuracy and augments the search productivity by using improved user interface that enables a user to interact with our search system effectively. In the future research, we will study a means of validating the better performance of our prototype system by comparing other search engine solution and will extend the applied domain into other domains for searching information such as portal.

Web Information Extraction using HTML Tag Pattern (HTML 태그페턴을 이용한 웹정보추출시스템)

  • Park, Byung-Kwon
    • Proceedings of the Korea Association of Information Systems Conference
    • /
    • 2005.05a
    • /
    • pp.79-92
    • /
    • 2005
  • To query the vast amount of web pages which are available i]l the Internet, it is necessary to extract the encoded information in the web pages for converting it into structured data (e.g. relational data for SQL) or semistructured data (e.g. XML data for XQuery), In this paper, we propose a new web information extraction system, PIES, to convert web information into XML documents. PIES is based on a user-specified target schema and HTML tag pattern descriptions. The web information is extracted by the pattern descriptions and validated by the target schema. We designed a new language to describe extraction rules, and a new regular expression to describe HTML tag patterns. We implemented PIES and applied it to the US patent web site to evaluate its correctness. It successfully extracted more than thousands of US patent data and converted them into XML documents.

  • PDF

Development of e-Catalog manager in Web-based e-Catalog System (웹 기반 e-catalog 시스템에서의 e-catalog 관리자 개발)

  • 장민제;박세형;하성도
    • Proceedings of the Korean Society of Precision Engineering Conference
    • /
    • 2003.06a
    • /
    • pp.885-889
    • /
    • 2003
  • The e-catalog system consists of e-catalog database. e-catalog manager. and a web server, and provides e-catalog web service by displaying e-catalog documents that contain web 3D images. product specifications and manuals. Various web contents such as the 3D images of products, which offer basic viewpoints/movement handles and function simulations, product specifications, product manuals and product features, can be integrated into e-catalog documents in XML format through image manipulation and database connection by using the e-catalog manager tool. By reducing time and cost for publication and management of an e-catalog web service, the competitiveness of companies is expected to be intensified in the perspective of e-business activities.

  • PDF

The Study on the RRS Designs in the Web-based Libraries (웹 의존형 라이브러리의 RRS 디자인에 관한 연구)

  • Kim, Sun-Ho
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.36 no.2
    • /
    • pp.231-241
    • /
    • 2002
  • In Web-based libraries, their RRS(Ready Reference Sites) provide their users with various web documents. And their users prefer finding web documents easily, rapidly and accurately Focused on their preference, Web designers or librarians working with the RRS should design the access path simply and conveniently. Concerning the design of the RRS in the Web-based library, 216 kinds of alternatives are identified as the result of the study When Web designers or librarians will improve or design their RRS, some of these alternatives might be helpful as the primary sources.

A Document Collection Method for More Accurate Search Engine (정확도 높은 검색 엔진을 위한 문서 수집 방법)

  • Ha, Eun-Yong;Gwon, Hui-Yong;Hwang, Ho-Yeong
    • The KIPS Transactions:PartA
    • /
    • v.10A no.5
    • /
    • pp.469-478
    • /
    • 2003
  • Internet information search engines using web robots visit servers conneted to the Internet periodically or non-periodically. They extract and classify data collected according to their own method and construct their database, which are the basis of web information search engines. There procedure are repeated very frequently on the Web. Many search engine sites operate this processing strategically to become popular interneet portal sites which provede users ways how to information on the web. Web search engine contacts to thousands of thousands web servers and maintains its existed databases and navigates to get data about newly connected web servers. But these jobs are decided and conducted by search engines. They run web robots to collect data from web servers without knowledge on the states of web servers. Each search engine issues lots of requests and receives responses from web servers. This is one cause to increase internet traffic on the web. If each web server notify web robots about summary on its public documents and then each web robot runs collecting operations using this summary to the corresponding documents on the web servers, the unnecessary internet traffic is eliminated and also the accuracy of data on search engines will become higher. And the processing overhead concerned with web related jobs on web servers and search engines will become lower. In this paper, a monitoring system on the web server is designed and implemented, which monitors states of documents on the web server and summarizes changes of modified documents and sends the summary information to web robots which want to get documents from the web server. And an efficient web robot on the web search engine is also designed and implemented, which uses the notified summary and gets corresponding documents from the web servers and extracts index and updates its databases.