Search | Korea Science

Converting HTML Documents to XML Documents through Interactions with Users (사용자와의 상호작용을 통한 HTML문서의 XML 문서로의 변환)

김승원;민준기;정진완
- Proceedings of the Korean Information Science Society Conference
- /
- 2002.10c
- /
- pp.103-105
- /
- 2002
웹에 데이터를 나타내기 위해서 사용하는 HTML은 데이터를 표시(presentation)하기 위한 언어일 뿐 데이터의 의미를 나타내지는 못한다. 이러한 HTML의 단점을 극복하고 데이터의 표시(presentation)와 의미(semantic)를 나타낼 수 있도록 한 마크업 언어가 XML이다. HTML로 나타난 정보를 제대로 이용하기 위해서는 HTML 문서의 의미(semantic)정보를 알아내야만 한다. HTML 문서를 XML 문서로 변경할 수 있다면, 변경된 문서의 의미 정보를 이용할 수 있을 것이다. HTML 문서 포멧(format)을 XML 문서 포멧(format)으로 변경하기 위한 작업으로 [1]이 있다. [1]에서는 자동으로(automatic) 변환하는 방법을 사용했다. 이러한 방법은 프로그램이 HTML 문서의 의미를 파악하는데 한계가 있기 때문에 변환된 XML 문서에서 문서의 의미를 제대로 나타내기 어렵다는 단점을 안고 있다. 본 논문에서는 HTML 문서의 의미론 제대로 나타내는 XML 문서를 만들기 위해서 사용자가 어느 정도 개려하여 최종적인 XML 문서를 만드는 방법을 제안한다. 제안한 방법은 사용자의 약간의 개입으로 원래 HTML 문서의 의미를 보다 더 잘 나타내는 XML 문서를 만들어낸다.
PDF

A Comparative Study of XML and HTML: Focusing on Their Characteristics and Retrieval Functions (디지털도서관 문서양식으로서의 XML과 HTML의 특성 및 검색 기능 비교 연구)

김현희;장혜원
- Journal of the Korean Society for information Management
- /
- v.16 no.2
- /
- pp.105-134
- /
- 1999
For efficient and precise searches in the Web environment, resources should be coded in a structured way. HTML does not cover semantic structure because of its fixed tagging. XML, which has emerged as an alternative standard markuplanguage, uses custom tags that allow structural searching. Therefore, this study aims to compare XML with HTML in terms of their characteristics and retrieval functions. In order to test retrieval functions of XML- and HTML-based systems, we constructed an experimental XML-based system. The XML-based system has several advantages over the HTML system. However, some improvements are needed to make the XML system more comprehensive and effective. First, XML document search engines with user-friendly interfaces are needed. Second, popular Web browsers such as Explorer and Communicator need to support XML 1.0 specification completely. Third, Open DTD format, which will allow information retrieval systems to retrieve documents and compress them into one single format, is also needed to control Web documents more efficiently.
PDF

GUI-based HTML2XML Wrapperusing Inductive Reasoning (학습 추론을 이용한 GUI 기반의 HTML2XML 래퍼)

Jang, Mun-Seong;Jeong, Jae-Mok;Choe, Il-Hwan;Kim, Hyeong-Ju
- Journal of KIISE:Databases
- /
- v.29 no.4
- /
- pp.311-320
- /
- 2002
The 'wrapper' is a module that extracts and processes information from the specified data source by the pre-composed extraction rule. 'HTML Wrapper for XML' extracts information from the web source as the form of XML document. Since composing the extraction rule is a repetitious and tedious job, it should be done as easy and fast as possible. This paper presents the method to minimize the composing job, which integrates GUI based training and scripting.
PDF KSCI

Implementation of an XML-Based Editor/Transformer for Large Volume of Similar Documents (XML 기반의 대용량 유사 문서 편집기/변환기 구현)

황인준
- The Journal of Society for e-Business Studies
- /
- v.9 no.1
- /
- pp.21-38
- /
- 2004
With its recent popularity, Web is now considered as a huge repository of information. Most documents on the web have been created using HTML(Hyper Text Markup Language). Even though HTML is simple and easy to learn, it has several features that are obstacles to the efficient information retrieval. XML(eXtensible Markup Language) can provide a solution to such problems and in fact, has already been used in many applications, XML is a standard markup language for exchanging data on the web. It can describe a document structure freely by defining its DTD, which enables efficient integration and retrieval of data on the web. In this paper, we propose a versatile and efficient XML document manager. Its features include (i) form-based XML editor that enables easy creation of new XML documents, (ii) automatic document converter that can transform HTML documents with similar structure into XML documents automatically, and (iii) GUI-based DTD editor.
PDF

Design and Implementation of XHTML Code Generator (XHTML 코드 생성기의 설계와 구현)

계승철;전서현
- Proceedings of the Korea Multimedia Society Conference
- /
- 2001.11a
- /
- pp.24-29
- /
- 2001
XHTML은 HTML의 요소와 XML의 문법을 가진 마크업 언어이다. XML과 HTML의 장점을 결합하여 발표되었으며, HTML에서 XML로 가는 중간단계로, HTML을 대체할 언어로, 또는 유무선 통합을 위한 마크업 언어로 보고 있다. XHTML 언어를 이용하기 위해 텍스트나 기존에 널리 쓰이고 있는 HTML을 규칙에 맞는 HTML로 바꾸고, 간단한 조작으로 쉽게 XHTML로 바꿀 수 있도록 하는 XHTML 코드 생성기를 설계·구현하였다.
PDF

Automatically Converting HTML Documents with Similar Pattern into XML Documents (유사 패턴을 갖는 HTML 문서의 XML 자동 변환)

O, Geum-Yong;Hwang, In-Jun
- The KIPS Transactions:PartD
- /
- v.9D no.3
- /
- pp.355-364
- /
- 2002
Recently, WWW(World Wide Web) has become a source of a large amount of information, and is now recognized not only as an information-sharing tool, but also as an information repository. Currently, the majority of documents on the web were created using HTML(Hypertext Markup Language). Although HTML is simple and easy to learn, its inherent lack of describing document structure makes it difficult to retrieve information effectively. One possible solution would be to convert such HTML documents into XML (extensible Markup Language) documents. This is a standard markup language for exchanging data on the web. It can describe a document structure freely by defining its own DTD (Document Type Definition). This makes it possible to integrate, store, and retrieve data on the web efficiently In this paper, we will propose a converter that automatically converts HTML documents with similar pattern into XML documents by analyzing the document structure and recognizing its path information.
https://doi.org/10.3745/KIPSTD.2002.9D.3.355 인용 PDF KSCI

Design and Implementation of XML Web Agent for Data Exchange and Replication between Heterogeneous DBMSs (이기종 DBMS간 데이터 교환과 복제를 위한 XML 웹 에이전트 설계 및 구현)

Yu, Sun-Young;Lee, Chun-Keun;Yim, Jae-Hong
- Journal of Korea Multimedia Society
- /
- v.7 no.7
- /
- pp.967-975
- /
- 2004
HTML is unstructured document because of using restricted tag. HTML is difficult to extract data from HTML document. But XML is able to use user definition tag, that is easy to store information. Also XML is easy to extract data from XML document. This is the reason why XML is a standard for data exchange format on the Internet, so XML is fitted to exchange data between heterogeneous DBMSs(DataBase Management System). In this paper, we designed and implemented of XML web agent for data replication between heterogeneous DBMSs. A XML web agent system controls data of DBMS, and generates a XML document from data of DBMS. Also XML web agent is data exchange or replication between heterogeneous DBMS by the medium of XML.
PDF

XML Conversion of HTML Documents Using Web Schema (웹 스키마를 이용한 HTML 문서의 XML 변환)

오금용;박동문;황인준
- Proceedings of the Korean Information Science Society Conference
- /
- 2001.04b
- /
- pp.175-177
- /
- 2001
최근에 웹(Web) 사용의 지속적이 증가로 인하여 정보가 급증하고, 이로 인하여 웹은 정보교환의 의미뿐아니라 정보 저장이라는 중요한 의미를 지니게 되었다. 하지만 현재 많은 웹 페이지들이 HTML(Hyper Text Markup Language)문서로 제작되어 있어 정보관리의 의미에서 많은 부족함이 있고 이를 보완하기 위한 방법 중에 하나가 구조적이고 기능적 언어로 부상하고 있는 XML(exTensive Markup Language)을 기반으로 하여 문서를 제작하거나 변환하는 것이다. 본 논문은 HTML문서를 XML문서로 변환하는데 있어HTML문서 구조를 분석하고 분석결과를 토대로 형성되는 웹 스키마(Schema)를 이용하여 구조 중심의 변환이 이루어지도록 하는 방법에 대해서 제안한다.
PDF

A Study on HTMLtoVoiceXML Converter (HTMLtoVoiceXML 변환기에 관한 연구)

최훈일;장영건
- Proceedings of the Korean Information Science Society Conference
- /
- 2001.10c
- /
- pp.373-375
- /
- 2001
음성 기술의 발달과 VoiceXML 1.0의 제정으로 인해 표준화된 방식으로 이동 단말기와 전화를 통해 음성으로 웹 컨텐츠에 접근할 수 있게 되였다. 거의 모든 웹 컨텐츠들은 HTML로 작성되어 있으며, 기존의 HTML로 작성된 수많은 웹 컨텐츠에 음성으로 접근하기 위해서는 HTML 문서들을 VoiceXML 문서로 변환하여야 한다. 이를 수동으로 변환하기 위해서는 많은 시간과 비용이 필요하게 된다. 본 논문에서는 이 문제를 해결하기 위해여 HTML 문서률 VoiceXML 문서로 자동 변환하는 HTMLtoVoiceXML변환기의 설계 방안을 제시하였다.
PDF

An Efficient Method for Logical Structure Analysis of HTML Tables (HTML 테이블의 논리적 구조분석을 위한 효율적인 방법)

Kim Yeon-Seok;Lee Kyong-Ho
- Journal of Korea Multimedia Society
- /
- v.9 no.9
- /
- pp.1231-1246
- /
- 2006
HTML is a format for rendering Web documents visually and uses tables to present a relational information. Since HTML has limits in terms of information processing and management by a computer, it is important to transform HTML tables into XML documents, which is able to represent logical structure information. As a prerequisite for extracting information from the Web, this paper presents an efficient method for extracting logical structures from HTML tables and transforming them into XML documents. The proposed method consists of two phases: Area segmentation and structure analysis. The area segmentation step removes noisy areas and extracts attribute and value areas through visual and semantic coherency checkup. The hierarchical structure between attribute and value areas are analyzed and transformed into XML representations using a proposed table model. Experimental results with 1,180 HTML tables show that the proposed method performs better than the conventional method, resulting in an average precision of 86.7%.
PDF

Search Result 271, Processing Time 0.031 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)