• Title/Summary/Keyword: Web Page Analysis

Search Result 154, Processing Time 0.022 seconds

A Study of Main Contents Extraction from Web News Pages based on XPath Analysis

  • Sun, Bok-Keun
    • Journal of the Korea Society of Computer and Information
    • /
    • v.20 no.7
    • /
    • pp.1-7
    • /
    • 2015
  • Although data on the internet can be used in various fields such as source of data of IR(Information Retrieval), Data mining and knowledge information servece, and contains a lot of unnecessary information. The removal of the unnecessary data is a problem to be solved prior to the study of the knowledge-based information service that is based on the data of the web page, in this paper, we solve the problem through the implementation of XTractor(XPath Extractor). Since XPath is used to navigate the attribute data and the data elements in the XML document, the XPath analysis to be carried out through the XTractor. XTractor Extracts main text by html parsing, XPath grouping and detecting the XPath contains the main data. The result, the recognition and precision rate are showed in 97.9%, 93.9%, except for a few cases in a large amount of experimental data and it was confirmed that it is possible to properly extract the main text of the news.

Analysis of Health Counseling by the Internet - on the Home page of Korean Rheumatology Health Professions Society (인터넷을 이용한 관절염 환자의 건강상담 내용분석 -대한류마티스 건강전문학회 홈페이지를 대상으로-)

  • Lee, Eun-Ok;Lee, Young-Sook
    • Journal of muscle and joint health
    • /
    • v.7 no.1
    • /
    • pp.40-52
    • /
    • 2000
  • Recently with the development in computer technology and its communication system, many changes in medical informatics enable us to use various medical information regardless of time or place. There are many home pages on the web, which provide medical counseling and hospital information. On May 11th 1999, Korean Rheumatology Health Professions Society began its new service as a home page on the web with various rheumatologic health information, questioning/ answering, and so on. This study was undertaken to examine the content and the purpose of health counseling on the web. The data was collected from 173 questioners who visited questioning/ answering site in the KRHPS home page for May 11th, 1999 through November 10th, 1999. Most of the questioners consulted the health problems of their or their families. Over two thirds of them were already diagnosed medically. Rheumatoid arthritis was the most frequent one. Other diseases, such as, osteoarthritis, ankylosing spondylitis, Still's disease were also on the list. Most of the questioners wanted to know treatment strategies, to consult about their symptoms, and to make diagnosis. And many questioners wanted detailed explanations about their diseases or the informations regarding the hospital. These findings suggest that the health counseling on the web may be used to supplement the lack of direct medical interviews with doctors. It also is expected to guide the patients to the right direction.

  • PDF

Intelligent Malicious Web-page Detection System based on Real Analysis Environment (리얼 분석환경 기반 지능형 악성 웹페이지 탐지 시스템)

  • Song, Jongseok;Lee, Kyeongsuk;Kim, Wooseung;Oh, Ikkyoon;Kim, Yongmin
    • Journal of KIISE
    • /
    • v.45 no.1
    • /
    • pp.1-8
    • /
    • 2018
  • Recently, distribution of malicious codes using the Internet has been one of the most serious cyber threats. Technology of malicious code distribution with detection bypass techniques has been also developing and the research has focused on how to detect and analyze them. However, obfuscated malicious JavaScript is almost impossible to detect, because the existing malicious code distributed web page detection system is based on signature and another limitation is that it requires constant updates of the detection patterns. We propose to overcome these limitations by means of an intelligent malicious code distributed web page detection system using a real browser that can analyze and detect intelligent malicious code distributed web sites effectively.

An Implementation of System for Detecting and Filtering Malicious URLs (악성 URL 탐지 및 필터링 시스템 구현)

  • Chang, Hye-Young;Kim, Min-Jae;Kim, Dong-Jin;Lee, Jin-Young;Kim, Hong-Kun;Cho, Seong-Je
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.4
    • /
    • pp.405-414
    • /
    • 2010
  • According to the statistics of SecurityFocus in 2008, client-side attacks through the Microsoft Internet Explorer have increased by more than 50%. In this paper, we have implemented a behavior-based malicious web page detection system and a blacklist-based malicious web page filtering system. To do this, we first efficiently collected the target URLs by constructing a crawling system. The malicious URL detection system, run on a specific server, visits and renders actively the collected web pages under virtual machine environment. To detect whether each web page is malicious or not, the system state changes of the virtual machine are checked after rendering the page. If abnormal state changes are detected, we conclude the rendered web page is malicious, and insert it into the blacklist of malicious web pages. The malicious URL filtering system, run on the web client machine, filters malicious web pages based on the blacklist when a user visits web sites. We have enhanced system performance by automatically handling message boxes at the time of ULR analysis on the detection system. Experimental results show that the game sites contain up to three times more malicious pages than the other sites, and many attacks incur a file creation and a registry key modification.

Multimedia UCC Services as a Web 2.0 and Consumer Participation (웹2.0의 동영상 UCC 서비스현황과 소비자 참여)

  • Kim, Yeon-Jeong;Park, Sun-Young
    • Journal of Families and Better Life
    • /
    • v.26 no.1
    • /
    • pp.95-105
    • /
    • 2008
  • This paper identifies current status and key determinants of participation in multimedia UCC as a web2.0 paradigm. Significant factors composed of IT digital convergence environments and young generation's value, attitude to connecting to web (human relation, visual expressionism, arousal, et al). This paper analyze multimedia UCC service status & current status of participation level of UCC users. The research implemented to analysis customer click streaming data (inter-temporary page-view, unique user)of small-mid multimedia UCC Company and to 2nd data(ww.rankey.com) for page-view and unique user to participation tendency by age and sex about total participation amount of multimedia UCC. In case of young generation be familiar with new internet service, Internet web space meaning important information seeking media and 1 person media able to connect to new web network as prosumer. In UCC centered internet business, web based customers implemented the role of prosumer as generate web contents and consuming to net-working.

A Study of Internet using Citation Analysis (인용분석을 이용한 인터넷 정보의 연구)

  • 곽철완
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.10 no.1
    • /
    • pp.213-222
    • /
    • 1999
  • The purpose of this study is to identify important web pages in a particular area The basic premise is that citation analysis can show a similar relationship among different web pages. Web pages about‘weather’were found using a search engine Each web page was examined by hyperlink from other web pases and/or to other web pages. After this process, seven web pases are linked by many web pages. Seven web pages were analyzed by co-citation analysis. The result shows that selected web pages are linked by characteristics of information provided.

  • PDF

Automatic Extraction of Dependencies between Web Components and Database Resources in Java Web Applications

  • Oh, Jaewon;Ahn, Woo Hyun;Kim, Taegong
    • Journal of information and communication convergence engineering
    • /
    • v.17 no.2
    • /
    • pp.149-160
    • /
    • 2019
  • Web applications typically interact with databases. Therefore, it is very crucial to understand which web components access which database resources when maintaining web apps. Existing research identifies interactions between Java web components, such as JavaServer Pages and servlets but does not extract dependencies between the web components and database resources, such as tables and attributes. This paper proposes a dynamic analysis of Java web apps, which extracts such dependencies from a Java web app and represents them as a graph. The key responsibility of our analysis method is to identify when web components access database resources. To fulfill this responsibility, our method dynamically observes the database-related objects provided in the Java standard library using the proxy pattern, which can be applied to control access to a desired object. This study also experiments with open source web apps to verify the feasibility of the proposed method.

A Study on the Usage Patterns of Medicine Information Through Web Log Analysis (웹로그 분석을 통한 의약품 정보 검색 주제별 이용 패턴에 관한 연구)

  • Cho Kyoung-Won;Woo Young-Woon
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2005.11a
    • /
    • pp.269-274
    • /
    • 2005
  • There are lots of medicine information on the internet recently. But there is no specific research result about search patterns or acquisition methods of medicine information on web pages for lay people until now. In this paper, 1 analyzed the web log files of a certain company providing medicine information using the WiseLog tool. I analyzed three kinds of statistic result of the web log files such as the status of web page usage by types of users, the status of web page menu usage, and the status of search menu usage. As results, I proposed methods of supplement and improvement for companies providing medicine information on the internet.

  • PDF

Design and Adaptation for Internet News Data Extraction Middleware(INDEM) System

  • Sun, Bok-Keun
    • Journal of the Korea Society of Computer and Information
    • /
    • v.21 no.4
    • /
    • pp.55-62
    • /
    • 2016
  • In this paper, we propose the INDEM(Internet News Data Extraction Middleware) system for the removal of the unnecessary data in internet news. Although data on the internet can be used in various fields such as source of data of IR(Information Retrieval), Data mining and knowledge information service, it contains a lot of unnecessary information. The removal of the unnecessary data is a problem to be solved prior to the study of the knowledge-based information service that is based on the data of the web page. The INDEM system parses html and explores the XPath, and it is to perform the analysis. The user simply utilize INDEM by implementing an abstract class that provides INDEM, and can obtain the analysis information. INDEM System through this process delivers the analysis information including the main contents of news site to the users. In this paper, the INDEM system was adapted in a stand-alone and web service system and it was evaluated on the basis of 16 news site. As a result, performance of the INDEM system is affected in html source data size and complexity of used html grammar than the main news data size.

Development of Web-based Multimedia Content for a Physical Examination and Health Assessment Course (웹기반의 건강사정 멀티미디어 컨텐츠 개발)

  • Oh Pok-Ja;Kim Il-Ok;Shin Sung-Rae;Jung Hoe-Kyung
    • Journal of Korean Academy of Nursing
    • /
    • v.34 no.6
    • /
    • pp.994-1003
    • /
    • 2004
  • Purpose: This study was to develop Web-based multimedia content for Physical Examination and Health Assesment. Method: The multimedia content was developed based on Jung's teaching and learning structure plan model, using the following 5 processes: 1) Analysis Stage, 2) Planning Stage, 3) Storyboard Framing and Production Stage, 4) Program Operation Stage, and 5) Final Evaluation Stage. Results: The web based multimedia content consisted of an intro movie, main page and sub pages. On the main page, there were 6 menu bars that consisted of Announcement center, Information of professors, Lecture guide, Cyber lecture, Q&A, and Data centers, and a site map which introduced 15 week lectures. In the operation of web based multimedia content, HTML, JavaScript, Flash, and multimedia technology(Audio and Video) were utilized and the content consisted of text content, interactive content, animation, and audio & video. Consultation with the experts in context, computer engineering, and educational technology was utilized in the development of these processes. Conclusions: Web-based multimedia content is expected to offer individualized and tailored learning opportunities to maximize and facilitate the effectiveness of the teaching and learning process. Therefore, multimedia content should be utilized concurrently with the lecture in the Physical Examination and Health Assesment classes as a vital teaching aid to make up for the weakness of the face-to- face teaching-learning method.