• Title/Summary/Keyword: web pages

Search Result 553, Processing Time 0.027 seconds

An Empirical Study on Changes of Web Pages (웹 문서 변화에 관한 실험적 연구)

  • Kim Sung Jin;Lee Sang Ho
    • Journal of KIISE:Databases
    • /
    • v.32 no.2
    • /
    • pp.151-160
    • /
    • 2005
  • As web pages are created, destroyed, and updated frequently, web databases should be updated to keep up-to-date web pages. In order to keep web databases fresh effectively, we need to understand the change of real web pages. Previous researches on the change of the web pages have directed their efforts on the contents modification of web pages only, and have not taken into account the factors of creation and destruction of web pages In their research. This paper investigates the web page changes, which include contents modification, page creation, and page destruction. We introduce three metrics, namely DR (Download Rate), MR (Modification Rate), and CAV (Coefficient of Age Variation) to represent the change of the web pages. We have monitored three million web pages collected from the famous and random sites every other day for one hundred days. With the Download Rate and the Modification Rate, we learned that the download success and the modification depends on the past change of them, and proposes two estimation formulae that predict the download success and modification. With the Coefficient of Age Variation, we show how web pages do not change periodically.

A Study of Internet using Citation Analysis (인용분석을 이용한 인터넷 정보의 연구)

  • 곽철완
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.10 no.1
    • /
    • pp.213-222
    • /
    • 1999
  • The purpose of this study is to identify important web pages in a particular area The basic premise is that citation analysis can show a similar relationship among different web pages. Web pages about‘weather’were found using a search engine Each web page was examined by hyperlink from other web pases and/or to other web pages. After this process, seven web pases are linked by many web pages. Seven web pages were analyzed by co-citation analysis. The result shows that selected web pages are linked by characteristics of information provided.

  • PDF

Web Page Evaluation based on Implicit User Reactions and Neural Networks

  • Lee, Dong-Hoon;Kim, Jae-Kwang;Lee, Jee-Hyong
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.12 no.2
    • /
    • pp.181-186
    • /
    • 2012
  • This paper proposes a method for evaluating web pages by considering implicit user reaction on web pages. Usually users spend more time and make more reactions, such as clicking, dragging and scrolling, while reading interesting pages. Based on this observation, a web page evaluation method by observing implicit user reaction is proposed. The system is designed with Ajax for observing user reactions, and neural networks for learning correlation between user reactions and usefulness of pages. The amounts of each type of user reactions are inputted to neural networks. Also the numbers of characters and images of pages are used as inputs because the amount of users' behaviors has a tendency to increase as the length of pages increase. The experiment is conducted with 113 people and 74 pages. Each page is ranked by users with a questionnaire. The proposed method shows more close ranking results to the user ranks than Google. That is, our system evaluates web pages more closely to users' viewpoint than Google. Although our experiment is limited, our result shows powerful potential of new element for web page evaluation. Some approaches evaluate web pages with their contents and some evaluate web pages with structural attributes, particularly links, of pages. Web page evaluation is for users, so the best evaluation can be done by users themselves. So, user feedback is one of the most important factors for web page evaluation. This paper proposes a new method which reflects user feedbacks on web pages.

Relation between the Image Analysis of Internet Fashion Shopping Site and Consumption Emotion - Focused on T-shirts Web Pages - (인터넷 쇼핑 사이트의 이미지 분석과 소비감성과의 관계 - 티셔츠 웹 페이지를 중심으로 -)

  • Kim, Eun-Jeong;Lee, Kyoung-Hee
    • Journal of the Korean Society of Clothing and Textiles
    • /
    • v.31 no.8
    • /
    • pp.1273-1285
    • /
    • 2007
  • The purpose of this study is to understand consumer emotion about T-shirts web pages and to provide the basis for effective design plan of them. 72 T-shirts web pages through 62 sites have been chosen as stimulus pictures, and the valuation tools are composed of 21 pairs of image adjective and 3 questions for valuation of consumption emotion. Data has been collected on subjects of 480 men and women at the age of $16{\sim}27$ who live in Busan. The image factors are Aestheticism, Activeness, Stability, Intimacy. The types of T-shirts web pages are classified into four groups. The image according to the type of T-shirts web pages has showed meaningful differences in all factors, and the differences of image factors according to design elements have been meaningfully presented. In the relation between consumption emotion and image of T-shirts web pages, Impulse needs, Buying needs, Recommendation needs are related to Aestheticism factor and Stability factor. The consumption emotion according to the type of T-shirts web pages is appeared high in the type 2(Refine image) and 3(Vivid image). The valuation of consumption emotion according design elements has presented meaningful differences all design elements except menu.

An Automatic Web Page Classification System Using Meta-Tag (메타 태그를 이용한 자동 웹페이지 분류 시스템)

  • Kim, Sang-Il;Kim, Hwa-Sung
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.38B no.4
    • /
    • pp.291-297
    • /
    • 2013
  • Recently, the amount of web pages, which include various information, has been drastically increased according to the explosive increase of WWW usage. Therefore, the need for web page classification arose in order to make it easier to access web pages and to make it possible to search the web pages through the grouping. Web page classification means the classification of various web pages that are scattered on the web according to the similarity of documents or the keywords contained in the documents. Web page classification method can be applied to various areas such as web page searching, group searching and e-mail filtering. However, it is impossible to handle the tremendous amount of web pages on the web by using the manual classification. Also, the automatic web page classification has the accuracy problem in that it fails to distinguish the different web pages written in different forms without classification errors. In this paper, we propose the automatic web page classification system using meta-tag that can be obtained from the web pages in order to solve the inaccurate web page retrieval problem.

Implementation of a Web Robot and Statistics on the Korean Web (웹 로봇 구현 및 한국 웹 통계보고)

  • Kim, Sung-Jin;Lee, Sang-Ho
    • The KIPS Transactions:PartC
    • /
    • v.10C no.4
    • /
    • pp.509-518
    • /
    • 2003
  • A web robot is a program that downloads and stores web pages. Implementation issues for developing web robots have been studied widely and various web statistics are reported in the literature. First, this paper describes the overall architecture of our robot and implementation decisions on several important issues. Second, we show empirical statistics on approximately 74 million Korean web pages. Third, we monitored 1,424 Korean web sites to observe the changes of web pages. We identify what factors of web pages could affect the changes. The factors may be used for the selection of web pages to be updated incrementally.

Estimation of Web Page Change Behavior (웹 문서 변경 예측)

  • Kim, Sung-Jin
    • Journal of Internet Computing and Services
    • /
    • v.8 no.4
    • /
    • pp.149-158
    • /
    • 2007
  • This paper presents the estimation methods computing the probabilities of how many times web pages are downloaded and modified, respectively, in the future crawls. The methods can make web database administrators avoid unnecessarily requesting undownloadable and unmodified web pages in a page group. We postulated that the change behavior of web pages is strongly related to the past change behavior. We gather the change histories of approximately three million web pages at two-day intervals for 100 days, and estimated the future change behavior of those pages. Our estimation, which was evaluated by actual change behavior of the pages, worked well.

  • PDF

Implementation of a Parallel Web Crawler for the Odysseus Large-Scale Search Engine (오디세우스 대용량 검색 엔진을 위한 병렬 웹 크롤러의 구현)

  • Shin, Eun-Jeong;Kim, Yi-Reun;Heo, Jun-Seok;Whang, Kyu-Young
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.14 no.6
    • /
    • pp.567-581
    • /
    • 2008
  • As the size of the web is growing explosively, search engines are becoming increasingly important as the primary means to retrieve information from the Internet. A search engine periodically downloads web pages and stores them in the database to provide readers with up-to-date search results. The web crawler is a program that downloads and stores web pages for this purpose. A large-scale search engines uses a parallel web crawler to retrieve the collection of web pages maximizing the download rate. However, the service architecture or experimental analysis of parallel web crawlers has not been fully discussed in the literature. In this paper, we propose an architecture of the parallel web crawler and discuss implementation issues in detail. The proposed parallel web crawler is based on the coordinator/agent model using multiple machines to download web pages in parallel. The coordinator/agent model consists of multiple agent machines to collect web pages and a single coordinator machine to manage them. The parallel web crawler consists of three components: a crawling module for collecting web pages, a converting module for transforming the web pages into a database-friendly format, a ranking module for rating web pages based on their relative importance. We explain each component of the parallel web crawler and implementation methods in detail. Finally, we conduct extensive experiments to analyze the effectiveness of the parallel web crawler. The experimental results clarify the merit of our architecture in that the proposed parallel web crawler is scalable to the number of web pages to crawl and the number of machines used.

Classifying Malicious Web Pages by Using an Adaptive Support Vector Machine

  • Hwang, Young Sup;Kwon, Jin Baek;Moon, Jae Chan;Cho, Seong Je
    • Journal of Information Processing Systems
    • /
    • v.9 no.3
    • /
    • pp.395-404
    • /
    • 2013
  • In order to classify a web page as being benign or malicious, we designed 14 basic and 16 extended features. The basic features that we implemented were selected to represent the essential characteristics of a web page. The system heuristically combines two basic features into one extended feature in order to effectively distinguish benign and malicious pages. The support vector machine can be trained to successfully classify pages by using these features. Because more and more malicious web pages are appearing, and they change so rapidly, classifiers that are trained by old data may misclassify some new pages. To overcome this problem, we selected an adaptive support vector machine (aSVM) as a classifier. The aSVM can learn training data and can quickly learn additional training data based on the support vectors it obtained during its previous learning session. Experimental results verified that the aSVM can classify malicious web pages adaptively.

Design and Implementation of Customer Personalized System Using Web Log and Purchase Database

  • Lee Jae-Hoon;Chung Hyun-Sook;Lee Sung-Joo
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.6 no.1
    • /
    • pp.21-26
    • /
    • 2006
  • In this paper, we propose a customer personalized system that presents the web pages to users which are customized to their individuality. It analyzes the action of users who visit the shopping mall, and preferentially supplies the necessary information to them. When they actually buy some items, it forecasts the user's access pattern to web site and their following purchasable items and improves their web page on the bases of their individuality. It reasons the relation among the web documents and among the items by using the log data of web server and the purchase information of DB. For reasoning, it employs Apriori algorithm, which is a method that searches the association rule. It reasons the web pages by considering the user's access pattern and time by using the web log and reasons the user's purchase pattern by using the purchase information of DB. On the basis of the relation among them, it appends the related web pages to link of user's web pages and displays the inferred goods on user's web pages.