• 제목/요약/키워드: XPath Grouping

검색결과 3건 처리시간 0.015초

A Study of Main Contents Extraction from Web News Pages based on XPath Analysis

  • Sun, Bok-Keun
    • 한국컴퓨터정보학회논문지
    • /
    • 제20권7호
    • /
    • pp.1-7
    • /
    • 2015
  • Although data on the internet can be used in various fields such as source of data of IR(Information Retrieval), Data mining and knowledge information servece, and contains a lot of unnecessary information. The removal of the unnecessary data is a problem to be solved prior to the study of the knowledge-based information service that is based on the data of the web page, in this paper, we solve the problem through the implementation of XTractor(XPath Extractor). Since XPath is used to navigate the attribute data and the data elements in the XML document, the XPath analysis to be carried out through the XTractor. XTractor Extracts main text by html parsing, XPath grouping and detecting the XPath contains the main data. The result, the recognition and precision rate are showed in 97.9%, 93.9%, except for a few cases in a large amount of experimental data and it was confirmed that it is possible to properly extract the main text of the news.

Design and Adaptation for Internet News Data Extraction Middleware(INDEM) System

  • Sun, Bok-Keun
    • 한국컴퓨터정보학회논문지
    • /
    • 제21권4호
    • /
    • pp.55-62
    • /
    • 2016
  • In this paper, we propose the INDEM(Internet News Data Extraction Middleware) system for the removal of the unnecessary data in internet news. Although data on the internet can be used in various fields such as source of data of IR(Information Retrieval), Data mining and knowledge information service, it contains a lot of unnecessary information. The removal of the unnecessary data is a problem to be solved prior to the study of the knowledge-based information service that is based on the data of the web page. The INDEM system parses html and explores the XPath, and it is to perform the analysis. The user simply utilize INDEM by implementing an abstract class that provides INDEM, and can obtain the analysis information. INDEM System through this process delivers the analysis information including the main contents of news site to the users. In this paper, the INDEM system was adapted in a stand-alone and web service system and it was evaluated on the basis of 16 news site. As a result, performance of the INDEM system is affected in html source data size and complexity of used html grammar than the main news data size.

NETCONF 계층에 대한 개선 기법 적용 및 통합 (The Application and Integration of an Improvement Technique for Layers of NETCONF)

  • 이양민;이재기
    • 정보과학회 논문지
    • /
    • 제43권2호
    • /
    • pp.256-268
    • /
    • 2016
  • 이기종의 다양한 장비로 구성된 현대의 네트워크는 분산 설치되어 있고, 이를 중앙 집중적이면서 효율적으로 관리하기 위해서 NETCONF 표준이 제정되었다. 본 논문에서는 NETCONF의 각 계층에 대해 개선한 연구를 포함하여 하나의 시스템으로 통합하는 작업을 수행하였다. RPC 계층에서는 멀티스레드를 사용하여 비동기 통신 채널 및 병렬 처리가 가능하도록 하였고, Operation 계층에서는 장비 설정 데이터 간 종속성을 이용한 데이터 그룹을 활용하여 연산의 효율성을 증가시켰다. Operation 계층과 연동할 수 있도록 Content 계층에서의 설정 데이터 모델링 기법에 대해서도 제시하였다. 마지막으로 GUI 프로그램을 구현하고 구현 결과를 나타내었다. 개선된 NETCONF와 표준 NETCONF를 질의 처리율, 질의 처리 속도, CPU 사용률에 대해 비교하는 실험을 수행한 결과 질의 처리율과 처리 속도에서는 개선된 NETCONF가, CPU 사용률에서는 표준 NETCONF가 우수하였다.