• Title/Summary/Keyword: web pages

Search Result 553, Processing Time 0.021 seconds

Heterogeneous Web Information Integration System based on Entity Identification

  • Shin, Hyung-Wook;Yang, Hyung-Jeong;Kim, Soo-Hyung;Lee, Guee-Sang;Kim, Kyoung-Yun;Kim, Sun-Hee;Ngoc, Do Luu
    • International Journal of Contents
    • /
    • v.8 no.4
    • /
    • pp.21-29
    • /
    • 2012
  • It is not easy for users to effectively have information that is semantically related but scattered on the Web. To obtain qualitatively improved information in web pages, it is necessary to integrate information that is heterogeneous but semantically related. In this study, we propose a method that provides XML-based metadata to users through integration of multiple heterogeneous Web pages. The metadata generated from the proposed system is obtained by integrating different heterogeneous information into a single page, using entity identification based on ontology. A wheelchair information integration system for disabled people is implemented to verify the efficiency of the proposed method. The implemented system provides an integrated web page from multiple web pages as a type of XML document.

The Set Expansion System Using the Mutual Importance Measurement Method to Automatically Build up Named Entity Domain Dictionaries (영역별 개체명 사전 자동 구축을 위한 상호 중요도 계산 기법 기반의 집합 확장 시스템)

  • Bae, Sang-Joon;Ko, Young-Joong
    • Korean Journal of Cognitive Science
    • /
    • v.19 no.4
    • /
    • pp.443-458
    • /
    • 2008
  • Since Web pages contain a lot of information today, the Web becomes an important resource to extract some information. In this paper, we proposes a set expansion system which can automatically extract named entities from the Web. Overall, the proposed method consists of three steps. First of all, Web pages, which may include many named entities of a domain, are collected by using several seed words of the domain. Then some pattern rules are extracted by using seed words and the collected Web pages, and the named entity candidates are selected through applying the extracted pattern rules into Web pages. To distinguish real named entities, we develop the new mutual importance measurement method which estimates the importance of named entity candidates. We conducted experiments for 3 domains for Korean and for 8 domains for English. As a result, the proposed method obtained 78.72% MAP in Korean and 96.48% MAP in English. In particular, the performances of English domains are better than the results of the Google set.

  • PDF

Web Impact Factor and Link Analysis of Indian Council of Agricultural Research (ICAR) Organizations

  • Kumar, Kutty
    • International Journal of Knowledge Content Development & Technology
    • /
    • v.8 no.1
    • /
    • pp.5-23
    • /
    • 2018
  • There have been extensive studies done on webometrics, particularly on the impact of websites and the web impact factor. The present study analyzed the websites of ICAR organizations, according to the webometrics indicator. It examines and explores the 92 ICAR organizational websites in India and identifies a number of web pages and link pages, and calculates the Overall Web Impact Factor (WIF) and Absolute Web Impact Factor (WIF). In this study, all websites were analyzed and data extracted using Google search engine. It suggests that Web Impact Factors can be calculated as a way of comparing the attractiveness of web sites or domains on the Web.

Hierarchical Web Structuring Using Integer Programming

  • Lee Wookey;Kim Seung;Kim Hando;Kang Suk-Ho
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 2004.10a
    • /
    • pp.51-67
    • /
    • 2004
  • World Wide Web is nearly ubiquitous and the tremendous growing number of Web information strongly requires a structuring framework by which an overview visualization of Web sites has provided as a visual surrogate for the users. We have a viewpoint that the Web site is a directed graph with nodes and arcs where the nodes correspond to Web pages and the arcs correspond to hypertext links between the Web pages. In dealing with the WWW, the goal in this paper is not to derive a naive shortest path or a fast access method, but to generate an optimal structure based on the context centric weight. We modeled a Web site formally so that a integer programming model can be formulated. Even if changes such as modification of the query terms, the optimized Web site structure can be maintained in terms of sensitivity.

  • PDF

An Effective Keyword Extraction Method Based on Web Page Structure Analysis for Video Retrieval in WWW (웹 페이지 구조 분석을 통한 효과적인 동영상 검색용 키워드 추출 방법)

  • Lee, Jong-Won;Choi, Gi-Seok;Jang, Ju-Yeon;Nang, Jong-Ho
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.35 no.3
    • /
    • pp.103-110
    • /
    • 2008
  • This paper proposes an effective keyword extraction method for the Web videos. The proposed method classifies the Web video pages in one of 4 types. As such, we analyzed the structure of the Web pages based on the number of videos and the layout of the Web pages. And then we applied the keyword extraction algorithm fit to each page type. The experiment with 1,087 Web pages that have total 2,462 videos showed that the recall of the proposed extraction method is 18% higher than ImagerRover[2]. So, the proposed method could be used to build a powerful video search system for WWW.

User modeling based on fuzzy category and interest for web usage mining

  • Lee, Si-Hun;Lee, Jee-Hyong
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.5 no.1
    • /
    • pp.88-93
    • /
    • 2005
  • Web usage mining is a research field for searching potentially useful and valuable information from web log file. Web log file is a simple list of pages that users refer. Therefore, it is not easy to analyze user's current interest field from web log file. This paper presents web usage mining method for finding users' current interest based on fuzzy categories. We consider not only how many times a user visits pages but also when he visits. We describe a user's current interest with a fuzzy interest degree to categories. Based on fuzzy categories and fuzzy interest degrees, we also propose a method to cluster users according to their interests for user modeling. For user clustering, we define a category vector space. Experiments show that our method properly reflects the time factor of users' web visiting as well as the users' visit number.

Understanding Web Designer′s Knowledge Structure for WWW (웹 페이지 설계자의 WWW에 대한 지식구조)

  • 곽철완
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.35 no.2
    • /
    • pp.171-185
    • /
    • 2001
  • The purpose of this study was to identify web designer's knowledge organization for WWW. Linked web sites were investigated in public library web pages. Independent variable was the supervised offices of public libraries and dependent variables were the web sites of type of public libraries, supervised offices of public libraries, and employment information. Results showed that special library and supervised office web sites were linked differently by public library web pages differently based upon supervised offices. This difference was resulted from geographical factors and types of information on the web sites.

  • PDF

The Dynamic Interface Representation of Web Sites using EMFG (EMFG를 이용한 웹사이트의 동적 인터페이스 표현)

  • Kim, Eun-Sook;Yeo, Jeong-Mo
    • The KIPS Transactions:PartD
    • /
    • v.15D no.5
    • /
    • pp.691-698
    • /
    • 2008
  • Web designers generally use a story board, a site map, a flow chart or the combination of these for representing web sites. But these methods are difficult to represent the entire architecture of a web site, and may be not adaptive for describing the detail flow of web pages. To solve these problems to some degree, there were works using EMFG(Extended Mark Flow Graph) recently. However the conventional EMFG representation method is not adaptive to represent the dynamic interface of web sites because that cover only the static parts of a web site. Internet utilization is rapidly growing in our life and we cannot imagine the worlds of work, study and business without internet. And web sites recently have not only more complex and various architecture but also web pages containing the dynamic interface. Therefore we propose the representation method of these web sites - for example, a web site containing varying pages with time and varying page status or contents with mouse operations - using EMFG. We expect our work to be help the design and maintenance of web sites.

Topic directed Web Spidering using Reinforcement Learning (강화학습을 이용한 주제별 웹 탐색)

  • Lim, Soo-Yeon
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.15 no.4
    • /
    • pp.395-399
    • /
    • 2005
  • In this paper, we presents HIGH-Q learning algorithm with reinforcement learning for more fast and exact topic-directed web spidering. The purpose of reinforcement learning is to maximize rewards from environment, an reinforcement learning agents learn by interacting with external environment through trial and error. We performed experiments that compared the proposed method using reinforcement learning with breath first search method for searching the web pages. In result, reinforcement learning method using future discounted rewards searched a small number of pages to find result pages.

Revisiting PageRank Computation: Norm-leak and Solution (페이지랭크 알고리즘의 재검토 : 놈-누수 현상과 해결 방법)

  • Kim, Sung-Jin;Lee, Sang-Ho
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.11 no.3
    • /
    • pp.268-274
    • /
    • 2005
  • Since introduction of the PageRank technique, it is known that it ranks web pages effectively In spite of its usefulness, we found a computational drawback, which we call norm-leak, that PageRank values become smaller than they should be in some cases. We present an improved PageRank algorithm that computes the PageRank values of the web pages correctly as well as its efficient implementation. Experimental results, in which over 67 million real web pages are used, are also presented.