Search | Korea Science

An Efficient Algorithm for Detecting Tables in HTML Documents (HTML 문서의 테이블 식별을 위한 효율적인 알고리즘)

Kim Yeon-Seok;Lee Kyong-Ho
- Journal of Korea Multimedia Society
- /
- v.7 no.10
- /
- pp.1339-1353
- /
- 2004
< TABLE > tags in HTML documents are widely used for formatting layout of Web documents as well as for describing genuine tables with relational information. As a prerequisite for information extraction from the Web, this paper presents an efficient method for sophisticated table detection. The proposed method consists of two phases: preprocessing and attribute-value relations extraction. For the preprocessing where genuine or ungenuine tables are filtered out, appropriate rules are devised based on a careful examination of general characteristics of < TABLE > tags. The remaining is detected at the attribute-value relations extraction phase. Specifically, a value area is extracted and checked out whether there is a syntactic coherency Futhermore, the method looks for a semantic coherency between an attribute area and a value area of a table that may be inappropriate for the syntactic coherency checkup. Experimental results with 11,477 < TABLE > tags from 1,393 HTML documents show at the method has performed better compared with previous works, resulting in a precision of 97.54% and a recall of 99.22% in average.
PDF

Detecting Tables in HTML Documents (HTML 문서의 테이블 식별)

김연석;이경호
- Proceedings of the Korean Information Science Society Conference
- /
- 2004.04b
- /
- pp.163-165
- /
- 2004
HTML의 ＜TABLE＞ 태그는 연관된 정보를 기술하기 위한 테이블은 물론이고 웹 문서의 레이아웃을 표현하기 위하여 사용된다 본 논문에서는 웹으로부터 유용한 정보를 추출하기 위한 목적의 일환으로 HTML 문서로부터 테이블을 식별하는 효율적인 방법을 제안한다. 제안된 방법은 전처리와 속성-값 연관관계에 추출의 두 단계로 구성된다. 전처리 단계에서는 진짜 테이블 또는 레이아웃용으로 사용된 ＜TABLE＞ 태그의 일반적인 특징을 반영한 규칙을 적용하여 진짜 또는 가짜로 명확히 식별이 가능한 ＜TABLE＞ 태그를 추출한다. 속성-값 연관관계 추출 단계에서는 테이블 영역을 속성 및 값 영역으로 구분한 후. 값 영역에 대하여 구문적 일관성 검사를 수행한다 또한 값 영역의 크기가 작아서 구문적 일관성 검사를 수행할 수 없는 경우, 속성-칸 영역의 의미적 일관성을 검사한다. 제안된 방법의 성능을 명가하기 위하여 1,393개의 HTML 문서로부터 추출한 11,477개의 ＜TABLE＞ 태그를 대상으로 실험한 결과. 평균적으로 97.54%의 정확률과 99.22%의 재현률을 보여 기존 연구보다 우수하였다.
PDF

Data Hiding for HTML Files Using Character Coding Table and Index Coding Table

Chou, Yung-Chen;Hsu, Ping-Kun;Lin, Iuon-Chang
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.7 no.11
- /
- pp.2913-2927
- /
- 2013
A data hiding scheme in HTML files is presented in this paper. Web pages are a very popular medium for broadcasting information and knowledge nowadays, and web pages are a good way to achieve the goal of secret message delivery because the different HTML coding codes will render the same screen in any of the popular browsers. The proposed method utilizes the HTML special space codes and sentence segmentation to conceal secret messages into a HTML file. The experimental results show that the stego HTML file generated by the proposed method is imperceptible. Also, the proposed method can conceal one more secret bit in every between-word location.
https://doi.org/10.3837/tiis.2013.11.021 인용 PDF KSCI KPUBS HTML

An Efficient Method for Logical Structure Analysis of HTML Tables (HTML 테이블의 논리적 구조분석을 위한 효율적인 방법)

Kim Yeon-Seok;Lee Kyong-Ho
- Journal of Korea Multimedia Society
- /
- v.9 no.9
- /
- pp.1231-1246
- /
- 2006
HTML is a format for rendering Web documents visually and uses tables to present a relational information. Since HTML has limits in terms of information processing and management by a computer, it is important to transform HTML tables into XML documents, which is able to represent logical structure information. As a prerequisite for extracting information from the Web, this paper presents an efficient method for extracting logical structures from HTML tables and transforming them into XML documents. The proposed method consists of two phases: Area segmentation and structure analysis. The area segmentation step removes noisy areas and extracts attribute and value areas through visual and semantic coherency checkup. The hierarchical structure between attribute and value areas are analyzed and transformed into XML representations using a proposed table model. Experimental results with 1,180 HTML tables show that the proposed method performs better than the conventional method, resulting in an average precision of 86.7%.
PDF

Design and Implementation of an HTML Converter Supporting Frame for the Wireless Internet (무선 인터넷을 위한 프레임 지원 HTML 변환기의 설계 및 구현)

Han, Jin-Seop;Park, Byung-Joon
- Journal of the Institute of Electronics Engineers of Korea CI
- /
- v.42 no.6
- /
- pp.1-10
- /
- 2005
This paper describes the implementation of HTML converter for wireless internet access in wireless application protocol environment. The implemented HTML converter consists of the contents conversion module, the conversion rule set, the WML file generation module, and the frame contents reformatting module. Plain text contents are converted to WML contents through one by one mapping, referring to the converting rule set in the contents converting module. For frame contents, the first frameset sources are parsed and the request messages are reconstructed with all the file names, reconnecting to web server as much as the number of files to receive each documents and append to the first document. Finally, after the process of reformatting in the frame contents reformatting module, frame contents are converted to WML's table contents. For image map contents, the image map related tags are parsed and the names of html documents which are linked to any sites are extracted to be replaced with WML contents data and linked to those contents. The proposed conversion method for frame contents provides a better interface for the users convenience and interactions compared to the existing converters. Conversion of image maps in our converter is one of the features not currently supported by other converters.
PDF KSCI

Extracting Web-Table Information Using Decision Tree and Rule Based Approach (기계학습과 규칙 기반 접근 방법을 결합한 의미 있는 표 구분과 헤드 영역 추출)

Jung, Sung-Won;Park, Dae-Won;Kwon, Hyuk-Chul
- Annual Conference on Human and Language Technology
- /
- 2004.10d
- /
- pp.5-11
- /
- 2004
일반적으로 HTML문서는 크게 내용과 구조로 이루어져 있다. HTML은 일반 문서와 달리 태그라는 것으로 문서에 추가 정보를 주며, 문서의 내용을 더욱 명확하게 한다. 따라서 태그를 이용하면 일반 문서보다 정보를 쉽게 구별하고 추출할 수 있다. 이러한 여러 가지 태그들 중에서 본 연구는 표를 중점적으로 연구한다. 표는 행과 열을 이용하여 어떤 사실을 조직하여 전달하는 것으로, 다른 구조적 특성들 보다 정보를 조직하는데 매우 유용하며, 글로 기술할 많은 분량을 간단히 줄이는 역할을 한다. 이와 같은 표의 특성에 주목하여 표에서 정보를 추출하는 분야를 기존 연구자들은 Web Table Mining 명명하였다. 본 연구는 기존 연구자들이 간과한 표의 구조적인 특성을 이용하여 전체 인터넷 문서에 적용할 수 있는 방법과 함께, 표에서 의미 있는 정보 추출을 위한 단계적인 모형을 제시한다.
PDF

An Open API Proxy Server System for Widget Services (위젯 서비스를 위한 오픈 API 프록시 서버 시스템)

Ahn, Byung-Hyun;Lee, Hyuk-Joon;Choi, Yong-Hoon;Chung, Young-Uk
- Journal of KIISE:Computing Practices and Letters
- /
- v.16 no.9
- /
- pp.918-926
- /
- 2010
A widget is a small application running by the users' favorite services, so they are provided with web contents without explicitly visiting the web site. Although widgets can be easily implemented with Open APIs, only a few web sites provide them because of refactoring the structures of web resource to supply Open APIs to the widget developers. This paper presents an Open API Proxy Server System for widget services. The system consists of two components: an Open API Source Code Generator and an Open API Proxy Server. The Open API Source Code Generator provides a Graphical User Interface (GUI) for users to generate the Open APIs of user's choice and sends the Open API source code generation request to the Open API Proxy Server. The Open API Proxy Server using the HTML Table Processing Library receives the HTML web page from web site and extracts useful information from the target HTML table. The proxy server converts the extracted data into the corresponding XML document which becomes available through the Open API. We verify the operation of the proposed system through experiments with the HTML tables in the example web sites.
PDF KSCI

Analysis of User Preferences on the Structure of Digital Textbook Contents (디지털교과서 내용 구성에 관한 사용자 선호도 분석)

Kim, Mi-Hye
- The Journal of the Korea Contents Association
- /
- v.9 no.12
- /
- pp.900-911
- /
- 2009
This paper analyzes user preferences on the basic structure of digital textbook contents based on the PDF and HTML formats. This was conducted by analysing the data from an online survey on user preferences for the representative structures of the PDF- and HTML-based digital textbook contents that are currently used on the Web. Results show that in the PDF format, the structure with TOC(Table Of Contents) links on the left screen and the main content on the right was most preferred by 82% of the respondents. In terms of the viewing method, the one that presents one page of the textbook fitted to the width of the computer screen in a single-page view was regarded as the best. Similarly, in the HTML format, the structure with TOC links on the left frame and the main content on the right using 2-frames was revealed as the most preferred by 84% of the respondents. However, the structures of the PDF- and HTML-based digital textbook contents employed by most existing Web sites go against the users' preferences. Accordingly, for digital textbook development in the future, user preferences must be considered to allow students to read the contents more easily and conveniently.
https://doi.org/10.5392/JKCA.2009.9.12.900 인용 PDF

Development of Convertor supporting Multi-languages for Mobile Network (무선전용 다중 언어의 번역을 지원하는 변환기의 구현)

Choe, Ji-Won;Kim, Gi-Cheon
- The KIPS Transactions:PartC
- /
- v.9C no.2
- /
- pp.293-296
- /
- 2002
UP Link is One of the commercial product which converts HTML to HDML convertor in order to show the internet www contents in the mobile environments. When UP browser accesses HTML pages, the agent in the UP Link controls the converter to change the HTML to the HDML, I-Mode, which is developed by NTT-Docomo of Japan, has many contents through the long and stable commercial service. Micro Explorer, which is developed by Stinger project, also has many additional function. In this paper, we designed and implemented WAP convertor which can accept C-HTML contents and mHTML contents. C-HTML format by I-Mode is a subset of HTML format, mHTML format by ME is similar to C-HTML, So the content provides can easily develop C-HTML contents compared with WAP and the other case. Since C-HTML, mHTML and WML are used under the mobile environment, the limited transmission capacity of one page is also similar. In order to make a match table. After that, we apply conversion algorithm on it. If we can not find matched element, we arrange some tags which only can be supported by WML to display in the best shape. By the result, we can convert over 90% contents.
https://doi.org/10.3745/KIPSTC.2002.9C.2.293 인용 PDF KSCI

Cooperative Robot for Table Balancing Using Q-learning (테이블 균형맞춤 작업이 가능한 Q-학습 기반 협력로봇 개발)

Kim, Yewon;Kang, Bo-Yeong
- The Journal of Korea Robotics Society
- /
- v.15 no.4
- /
- pp.404-412
- /
- 2020
Typically everyday human life tasks involve at least two people moving objects such as tables and beds, and the balancing of such object changes based on one person's action. However, many studies in previous work performed their tasks solely on robots without factoring human cooperation. Therefore, in this paper, we propose cooperative robot for table balancing using Q-learning that enables cooperative work between human and robot. The human's action is recognized in order to balance the table by the proposed robot whose camera takes the image of the table's state, and it performs the table-balancing action according to the recognized human action without high performance equipment. The classification of human action uses a deep learning technology, specifically AlexNet, and has an accuracy of 96.9% over 10-fold cross-validation. The experiment of Q-learning was carried out over 2,000 episodes with 200 trials. The overall results of the proposed Q-learning show that the Q function stably converged at this number of episodes. This stable convergence determined Q-learning policies for the robot actions. Video of the robotic cooperation with human over the table balancing task using the proposed Q-Learning can be found at http://ibot.knu.ac.kr/videocooperation.html.
https://doi.org/10.7746/jkros.2020.15.4.404 인용 PDF KSCI

Search Result 16, Processing Time 0.025 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)