• Title/Summary/Keyword: change of web pages

Search Result 25, Processing Time 0.024 seconds

An Empirical Study on Changes of Web Pages (웹 문서 변화에 관한 실험적 연구)

  • Kim Sung Jin;Lee Sang Ho
    • Journal of KIISE:Databases
    • /
    • v.32 no.2
    • /
    • pp.151-160
    • /
    • 2005
  • As web pages are created, destroyed, and updated frequently, web databases should be updated to keep up-to-date web pages. In order to keep web databases fresh effectively, we need to understand the change of real web pages. Previous researches on the change of the web pages have directed their efforts on the contents modification of web pages only, and have not taken into account the factors of creation and destruction of web pages In their research. This paper investigates the web page changes, which include contents modification, page creation, and page destruction. We introduce three metrics, namely DR (Download Rate), MR (Modification Rate), and CAV (Coefficient of Age Variation) to represent the change of the web pages. We have monitored three million web pages collected from the famous and random sites every other day for one hundred days. With the Download Rate and the Modification Rate, we learned that the download success and the modification depends on the past change of them, and proposes two estimation formulae that predict the download success and modification. With the Coefficient of Age Variation, we show how web pages do not change periodically.

Estimation of Web Page Change Behavior (웹 문서 변경 예측)

  • Kim, Sung-Jin
    • Journal of Internet Computing and Services
    • /
    • v.8 no.4
    • /
    • pp.149-158
    • /
    • 2007
  • This paper presents the estimation methods computing the probabilities of how many times web pages are downloaded and modified, respectively, in the future crawls. The methods can make web database administrators avoid unnecessarily requesting undownloadable and unmodified web pages in a page group. We postulated that the change behavior of web pages is strongly related to the past change behavior. We gather the change histories of approximately three million web pages at two-day intervals for 100 days, and estimated the future change behavior of those pages. Our estimation, which was evaluated by actual change behavior of the pages, worked well.

  • PDF

Classifying Malicious Web Pages by Using an Adaptive Support Vector Machine

  • Hwang, Young Sup;Kwon, Jin Baek;Moon, Jae Chan;Cho, Seong Je
    • Journal of Information Processing Systems
    • /
    • v.9 no.3
    • /
    • pp.395-404
    • /
    • 2013
  • In order to classify a web page as being benign or malicious, we designed 14 basic and 16 extended features. The basic features that we implemented were selected to represent the essential characteristics of a web page. The system heuristically combines two basic features into one extended feature in order to effectively distinguish benign and malicious pages. The support vector machine can be trained to successfully classify pages by using these features. Because more and more malicious web pages are appearing, and they change so rapidly, classifiers that are trained by old data may misclassify some new pages. To overcome this problem, we selected an adaptive support vector machine (aSVM) as a classifier. The aSVM can learn training data and can quickly learn additional training data based on the support vectors it obtained during its previous learning session. Experimental results verified that the aSVM can classify malicious web pages adaptively.

Design and Implementation of an HTML Pages Modification Detector for Meta-search Engines (메타 검색엔진을 위한 HTML 문서 변경 탐지기의 설계 및 구현)

  • Park, Sang-Wi;O, Jeong-Seok;Lee, Sang-Ho
    • The KIPS Transactions:PartD
    • /
    • v.9D no.3
    • /
    • pp.345-354
    • /
    • 2002
  • HTML pages in the web change at any time. It could cause to decrease the functionality of meta-search engines which provide users with integrated results of search engines. To solve this problem, we propose an HTML pages modification detector. It utilities information of element positions in HTML pages and the modified Jaak Vilo algorithm. The HTML page modification detector uses patterns that represent the structure of HTML expressions occurring repeatedly in HTML pages. An experiment is carried out to verify the correctness of the modification detector.

Web Change Detection System Using the Semantic Web (시맨틱 웹을 이용한 웹 변경 탐지 시스템)

  • Cho Boo-Hyun;Min Young-Kun;Lee Bog-Ju
    • The KIPS Transactions:PartB
    • /
    • v.13B no.1 s.104
    • /
    • pp.21-26
    • /
    • 2006
  • The semantic web is an emerging paradigm in the information retrieval and Web-based system. This paper deals with a Web change detection system which employs the semantic web and ontology. While existing Web change detection systems detect the syntactic change, the proposed system focuses on the detection of the semantic change. The system detects the change only when the web has semantic change. To achieve this, the system employs the domain-specific ontology (e.g., computer science professional person information in the paper). The Web pages regarding before and after change are converted according to the ontology. Then the comparison is performed. The experimental result shows the semantic-based change detection is more useful than the syntax-based change detection.

An Efficient Approach for Single-Pass Mining of Web Traversal Sequences (단일 스캔을 통한 웹 방문 패턴의 탐색 기법)

  • Kim, Nak-Min;Jeong, Byeong-Soo;Ahmed, Chowdhury Farhan
    • Journal of KIISE:Databases
    • /
    • v.37 no.5
    • /
    • pp.221-227
    • /
    • 2010
  • Web access sequence mining can discover the frequently accessed web pages pursued by users. Utility-based web access sequence mining handles non-binary occurrences of web pages and extracts more useful knowledge from web logs. However, the existing utility-based web access sequence mining approach considers web access sequences from the very beginning of web logs and therefore it is not suitable for mining data streams where the volume of data is huge and unbounded. At the same time, it cannot find the recent change of knowledge in data streams adaptively. The existing approach has many other limitations such as considering only forward references of web access sequences, suffers in the level-wise candidate generation-and-test methodology, needs several database scans, etc. In this paper, we propose a new approach for high utility web access sequence mining over data streams with a sliding window method. Our approach can not only handle large-scale data but also efficiently discover the recently generated information from data streams. Moreover, it can solve the other limitations of the existing algorithm over data streams. Extensive performance analyses show that our approach is very efficient and outperforms the existing algorithm.

Security Check Scheduling for Detecting Malicious Web Sites (악성사이트 검출을 위한 안전진단 스케줄링)

  • Choi, Jae Yeong;Kim, Sung Ki;Min, Byoung Joon
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.2 no.9
    • /
    • pp.405-412
    • /
    • 2013
  • Current web has evolved to a mashed-up format according to the change of the implementation and usage patterns. Web services and user experiences have improved, however, security threats are also increased as the web contents that are not yet verified combine together. To mitigate the threats incurred as an adverse effect of the web development, we need to check security on the combined web contents. In this paper, we propose a scheduling method to detect malicious web pages not only inside but also outside through extended links for secure operation of a web site. The scheduling method considers several aspects of each page including connection popularity, suspiciousness, and check elapse time to make a decision on the order for security check on numerous web pages connected with links. We verified the effectiveness of the security check complying with the scheduling method that uses the priority given to each page.

Design and Implementation of Web-based Factory Monitoring System for Complement MES (제조실행시스템의 기능 보완을 위한 웹 기반 공장 모니터링시스템의 설계 및 구현)

  • Kim, Yun-Gi;Gang, Mun-Seol;Kim, Byeong-Gi
    • The KIPS Transactions:PartD
    • /
    • v.9D no.4
    • /
    • pp.667-676
    • /
    • 2002
  • Digital environment that is represented to Internet is displacing business way of industry and business achievement way with the fast speed being giving great change on life whole. Improve existent business process utilizing Internet and Web connection technology, information superhighway to tradition industrialist manufacturer and e-Transformation's propulsion that wish to maximize productivity and administration efficiency is spread vigorously. In this paper, administration efficiency raising of Web-based factory monitoring system to do monitoring when everywhere integrating operation present condition current point of time of discrete factories in inside and outside of the country by purpose design and implemented. That is, do to normalize system structure and achievement function, and administration data of Web-based, and design using VML (Unified Modeling Language) and take advantage of ASP (Active Server Pages) and implemented web function. Implemented Web-based factory monitoring system is applying two factory (Kl, K2) of K CO., LTD. tire operation division to caravan, and application result was evaluated by very efficient thing to grasp operation situation of whole factory synthetically.

Asynchronous Web Crawling Algorithm (링크 분석을 통한 비동기 웹 페이지 크롤링 알고리즘)

  • Won, Dong-Hyun;Park, Hyuk-Gyu;Kang, Yun-Jeong;Lee, Min-Hye
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.10a
    • /
    • pp.364-366
    • /
    • 2022
  • The web uses an asynchronous web method to provide various information having different processing speeds together. The asynchronous method has the advantage of being able to respond to other events even before the task is completed, but a typical crawler has difficulty collecting information provided asynchronously by collecting point-of-visit information on a web page. In addition, asynchronous web pages often do not change their web address even if the page content is changed, making it difficult to crawl. In this paper, we propose a web crawling algorithm considering asynchronous page movement by analyzing links in the web. With the proposed algorithm, it was possible to collect dictionary information on TTA terms that provide information asynchronously.

  • PDF

A Study on Protecting for forgery modification of User-input on Webpage (웹 페이지에서 사용자 입력 값 변조 방지에 관한 연구)

  • Yu, Chang-Hun;Moon, Jong-Sub
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.24 no.4
    • /
    • pp.635-643
    • /
    • 2014
  • Most of the web-based services are provided by a web browser. A web browser receives a text-based web page from the server and translates the received data for the user to view. There are a myriad of add-ons to web browsers that extend browser features. The browser's add-ons may access web pages and make changes to the data. This makes web-services via web browsers are vulnerable to security threats. A web browser stores web page data in memory in the DOM structure. One method that prevents modifications to web page data applies hash values to certain parts in the DOM structure. However, a certain characteristic of web-pages renders this method ineffective at times. Specifically, the user-input data is not pre-determined, and the hash value cannot be calculated prior to user input. Thus the modification to the data cannot be prevented. This paper proposes a method that both detects and inhibits any attempt to change to user-input data. The proposed method stores user-input from the keyboard and makes a comparison with the data transmitted from the web browser to detect any anomalies.