• Title/Summary/Keyword: Web Parsing

Search Result 59, Processing Time 0.027 seconds

The Priority Heuristics for Concurrent Parsing of JavaScript (자바스크립트 동시 파싱을 위한 우선순위 휴리스틱)

  • Cha, Myungsu;Park, Hyukwoo;Moon, Soo-Mook
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.8
    • /
    • pp.510-515
    • /
    • 2017
  • It is important to speed up the loading time of web applications. Parsing is a loading process that contributes to an increased loading time. To address this issue, the optimization called Concurrent Parsing has been proposed which handles the parsing process in parallel by using additional threads. However, Concurrent Parsing has a limitation that it does not consider the priority order of parsing. In this paper, we propose heuristics that exploit priorities of parsing to improve the Concurrent Parsing. For parsing priority, we empirically investigate the sequence of function calls, classify functions into 3 categories, and extract function call probabilities. If a function has high call probability, we give a high priority and if a function has low probability, we give a low priority. We evaluate this priority heuristics on real web applications and get the 2.6% decrease of loading time on average.

Static Analysis of Web Accessibility Based on Abstract Parsing (요약파싱기법을 사용한 웹 접근성의 정적 분석)

  • Kim, Hyunha;Doh, Kyung-Goo
    • Journal of KIISE
    • /
    • v.41 no.12
    • /
    • pp.1099-1109
    • /
    • 2014
  • Web-accessibility evaluation tools can be used to determine whether or not a website meets accessibility guidelines. As such, many such tools have been developed for web accessibility, but most of them dynamically fetch and analyze pages and as a result, some pages maybe omitted due to the lack of access authorization or environment information. In this paper, we propose a static method that analyzes web accessibility via abstract parsing. Our abstract parsing technique understands syntactic and semantic program structures that dynamically generate web pages according to external inputs and parameters. The static method performs its analysis without omitting any pages because it covers all execution paths. We performed an experiment with a PHP-based website to demonstrate how our tool discovers more accessibility errors than a dynamic page accessibility analysis tool.

Ontology Parser Design for Speed Improvement of Ontology Parsing (온톨로지 파싱 속도향상을 위한 온톨로지 파서 설계)

  • Kim, Won-Pil;Kong, Hyun-Jang
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.47 no.4
    • /
    • pp.96-101
    • /
    • 2010
  • The core study of semantic web is the efficiency of ontology parsing. The ontology parsing and inference is based on the significant information retrieval which is the ultimate purpose of semantic web. However, most existing ontology writing tools were not processing the efficient ontology parsing. Therefore, we design the two steps ontology parser for extracting the all facts, are included in the ontology, more fast in this study. In the first step, the token extractor collects the all tokens of ontology and the triple extractor extracts the statements in the collected tokens. In conclusion, we confirm that which is designed in this study, processes the ontology parsing more faster than the existing ontology parsers.

Improving spaCy dependency annotation and PoS tagging web service using independent NER services

  • Colic, Nico;Rinaldi, Fabio
    • Genomics & Informatics
    • /
    • v.17 no.2
    • /
    • pp.21.1-21.6
    • /
    • 2019
  • Dependency parsing is often used as a component in many text analysis pipelines. However, performance, especially in specialized domains, suffers from the presence of complex terminology. Our hypothesis is that including named entity annotations can improve the speed and quality of dependency parses. As part of BLAH5, we built a web service delivering improved dependency parses by taking into account named entity annotations obtained by third party services. Our evaluation shows improved results and better speed.

Design and Implemetation of EasyWeb that searching and sharing to Informations (정보 검색 및 공유가 가능한 EasyWeb 설계 및 구현)

  • Gang, Sang-Eun;Kim, Taek-Hwan;Kang, Min-Young;Joo, Ok-Chan;Kim, Jin-Mook
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2011.11a
    • /
    • pp.1411-1413
    • /
    • 2011
  • 기존의 인터넷 검색 편리성을 제공하는 브라우저들은 사용자의 요구에 따라 수동적으로 움직이게 된다. 또한 RSS 와 같은 고급 검색 요구 조건을 만족시키고자 하는 노력에 비하여 사용자의 요구에 따라 능동적으로 움직이기에는 어려움이 존재한다. 이에 본 연구에서는 RSS와 같은 능동적인 정보 검색 및 제공이 가능하고, 표준 HTML2.0을 따르는 효과적인 웹 브라우저인 EasyWeb을 설계 및 구현하고자 한다. 본 논문에서 제안한 EasyWeb 브라우저는 기존의 브라우저들과 달리 표준 규격에 따라 구성하도록 HTML과 XML parsing이 가능하다. 또한 사용자의 요구에 능동적으로 정보를 수집하여 제공할 수 있다. 본 논문에서 제안한 EasyWeb의 구현 결과를 살펴보면 향후 웹 브라우저의 나아갈 방향을 모색할 수 있을 것으로 생각된다.

PDFindexer: Distributed PDF Indexing system using MapReduce

  • Murtazaev, JAziz;Kihm, Jang-Su;Oh, Sangyoon
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.4 no.1
    • /
    • pp.13-17
    • /
    • 2012
  • Indexing allows converting raw document collection into easily searchable representation. Web searching by Google or Yahoo provides subsecond response time which is made possible by efficient indexing of web-pages over the entire Web. Indexing process gets challenging when the scale gets bigger. Parallel techniques, such as MapReduce framework can assist in efficient large-scale indexing process. In this paper we propose PDFindexer, system for indexing scientific papers in PDF using MapReduce programming model. Unlike Web search engines, our target domain is scientific papers, which has pre-defined structure, such as title, abstract, sections, references. Our proposed system enables parsing scientific papers in PDF recreating their structure and performing efficient distributed indexing with MapReduce framework in a cluster of nodes. We provide the overview of the system, their components and interactions among them. We discuss some issues related with the design of the system and usage of MapReduce in parsing and indexing of large document collection.

A Study of Main Contents Extraction from Web News Pages based on XPath Analysis

  • Sun, Bok-Keun
    • Journal of the Korea Society of Computer and Information
    • /
    • v.20 no.7
    • /
    • pp.1-7
    • /
    • 2015
  • Although data on the internet can be used in various fields such as source of data of IR(Information Retrieval), Data mining and knowledge information servece, and contains a lot of unnecessary information. The removal of the unnecessary data is a problem to be solved prior to the study of the knowledge-based information service that is based on the data of the web page, in this paper, we solve the problem through the implementation of XTractor(XPath Extractor). Since XPath is used to navigate the attribute data and the data elements in the XML document, the XPath analysis to be carried out through the XTractor. XTractor Extracts main text by html parsing, XPath grouping and detecting the XPath contains the main data. The result, the recognition and precision rate are showed in 97.9%, 93.9%, except for a few cases in a large amount of experimental data and it was confirmed that it is possible to properly extract the main text of the news.

An Implementation of the Speech-Library and Conversion Web-Services of the Web-Page for Speech-Recognition (음성인식을 위한 웹페이지 변환 웹서비스와 음성라이브러리 구현)

  • Oh, Jee-Young;Kim, Yoon-Joong
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2006.11a
    • /
    • pp.478-482
    • /
    • 2006
  • This paper implemented speech-library and the Web Services that conversion the Web page for the speech recognition. The system is consisted of Web services consumer and Web services providers. The Web services consumer has libraries that Speech-library and proxy-library. The Speech -library has functions as follows from the user's speech extracted speech-data and searching the URL in link-table that is mapped with user's speech. The proxy-library calls two web services and is received the returning result. The Web services provider consisted of Parsing Web Services and Speech-Recognition Web Services. Parsing Web Services adds ActiveX control and reconstructs web page using the speech recognition. The speech recognizer is the web service providers that implemented in the previous study. As the result of experiment, we show that reconstructs web page and creates link-Table. Also searching the URL in link-table that is mapped with user's speech. Also confirmed returning the web page to user by searching URL in link-table that is mapped with the result of speech recognition web services.

  • PDF

Development of Collaborative Script Analysis Platform Based on Web for Information Retrieval Related to Story (스토리 정보의 검색을 위한 웹 기반의 협업적 스크립트 분석 플랫폼 개발)

  • Park, Seung-Bo;Kim, Hyun-Sik;Baek, Yeong-Tae;You, Eun-Soon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.19 no.9
    • /
    • pp.93-101
    • /
    • 2014
  • Movie stories can be retrieved efficiently by analyzing a script, which is a blueprint of the movie. Although the movie script is described in the formatted structure of Final Draft, it is hard to restore the type without analyzing the story of the sentences since the scripts open on the website are mostly broken. For this purpose, it is necessary to develop and provide the web-based script analysis software so that users collaboratively and freely check and correct the errors in the results after automatically parsing the script. Hence, in this paper we suggest the structure of the web-based collaborative script analysis platform that enables users to modify and filter the type error of the script for high level of film data accumulation and performance evaluation for the implementation results is conducted. Through the experiment, accuracy of automatically parsing appears to be 64.95% and performance of modification by collaboration showed 99.58% of accuracy of parsing with errors mostly corrected after passing through 5 steps of modification.

FastIO: High Speed Launching of Smart TV Apps (FastIO: 스마트 TV 앱의 고속 구동 기법)

  • Lee, Cheolhee;Hwang, Taeho;Won, Youjip;Lee, Seongjin
    • Journal of KIISE
    • /
    • v.43 no.7
    • /
    • pp.725-735
    • /
    • 2016
  • Smart TV uses Webkit as a web browser engine to provide contents such as web surfing, VOD watching, and games. Webkit uses web resources, such as HTML, CSS, JavaScript, and images, in order to run applications. At the start of an application, Webkit loads resources to the memory and creates DOM tree and render tree, which is a time consuming process. However, DOM tree and render tree created by the smart TV application do not change over time because the smart TV application uses web resources stored in a disk. If DOM tree and render tree can be stored and reused, it is possible to reduce loading time of an application. In this paper, we propose FastIO technique that selectively adds persistency to dynamically allocated memory. FastIO reduces overall application loading time by eliminating the process of loading resources from storage, parsing the HTML documents, and creating DOM tree and render tree. Comparison of the application resource loading times indicates that the web browser with FastIO is 7.9x, 44.8x, and 2.9x faster than the legacy web browser in an SSD, Ramdisk, and eMMC environment, respectively.