Search | Korea Science

Automatic Classification of Web documents According to their Styles (스타일에 따른 웹 문서의 자동 분류)

Lee, Kong-Joo;Lim, Chul-Su;Kim, Jae-Hoon
- The KIPS Transactions:PartB
- /
- v.11B no.5
- /
- pp.555-562
- /
- 2004
A genre or a style is another view of documents different from a subject or a topic. The style is also a criterion to classify the documents. There have been several studies on detecting a style of textual documents. However, only a few of them dealt with web documents. In this paper we suggest sets of features to detect styles of web documents. Web documents are different from textual documents in that Dey contain URL and HTML tags within the pages. We introduce the features specific to web documents, which are extracted from URL and HTML tags. Experimental results enable us to evaluate their characteristics and performances.
https://doi.org/10.3745/KIPSTB.2004.11B.5.555 인용 PDF KSCI

Design and Implementation of an HTML Pages Modification Detector for Meta-search Engines (메타 검색엔진을 위한 HTML 문서 변경 탐지기의 설계 및 구현)

Park, Sang-Wi;O, Jeong-Seok;Lee, Sang-Ho
- The KIPS Transactions:PartD
- /
- v.9D no.3
- /
- pp.345-354
- /
- 2002
HTML pages in the web change at any time. It could cause to decrease the functionality of meta-search engines which provide users with integrated results of search engines. To solve this problem, we propose an HTML pages modification detector. It utilities information of element positions in HTML pages and the modified Jaak Vilo algorithm. The HTML page modification detector uses patterns that represent the structure of HTML expressions occurring repeatedly in HTML pages. An experiment is carried out to verify the correctness of the modification detector.
https://doi.org/10.3745/KIPSTD.2002.9D.3.345 인용 PDF KSCI

An Automatic Text Categorization Theories and Techniques for Text Management (문서관리를 위한 자동문서범주화에 대한 이론 및 기법)

Ko, Young-Joong;Seo, Jung-Yun
- Journal of Information Management
- /
- v.33 no.2
- /
- pp.19-32
- /
- 2002
With the growth of the digital library and the use of Internet, the amount of online text information has increased rapidly. The need for efficient data management and retrieval techniques has also become greater. An automatic text categorization system assigns text documents to predefined categories. The system allows to reduce the manual labor for text categorization. In order to classify text documents, the good features from the documents should be selected and the documents are indexed with the features. In this paper, each steps of text categorization and several techniques used in each step are introduced.
https://doi.org/10.1633/JIM.2002.33.2.019 인용 PDF

A Hypertext Categorization Method using Incrementally Computable Class Link Information (점진적으로 계산되는 분류정보와 링크정보를 이용한 하이퍼텍스트 문서 분류 방법)

Oh, Hyo-Jung;Myaeng, Sung-Hyoun
- Journal of KIISE:Software and Applications
- /
- v.29 no.7
- /
- pp.498-509
- /
- 2002
As WWW grows at an increasing speed, a classifier targeted at hypertext has become in high demand. While document categorization il quite mature, the issue of utilizing hypertext structure and hyperlinks has been relatively unexplored. In this paper, we propose a practical method for enhancing both the speed and the quality of hypertext categorization using hyerlinks. In comparison against a recently proposed technique that appears to be the only one of the kind, we obtained up to 18.5% of improvement in effectiveness while reducing the processing time dramatically. We attempt to explain through experiments what factors contribute to tile improvement.
PDF KSCI

EDI processing system for port logistics system design (항만 물류 시스템을 위한 EDI 전자문서 처리 시스템 설계)

Chin, Sung-Geun;Ham, Jong-Wan;Park, Jong-Il;Kim, Jeong-Sig;Jung, Hoe-Kyung
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2010.10a
- /
- pp.462-464
- /
- 2010
Logistic of port to use electronic document handling and trade electronic document EDI is used. Thus, EDI electronic documents system is electronic document processing system can handle. and the user wants to convert document formats, or extract the desired information the user should be able to use. Conversion of existing EDI documents and document conversion systems used in mapping the complexity of the grammar of the script, and it is difficult to use the operator. In this paper, efficient processing of EDI documents to electronic mapping of the existing operators in the grammar syntax to create an easy to use, more efficiency, increase operational management of EDI electronic document processing system is designed.
PDF

Sentence Interaction-based Document Similarity Models for News Clustering (뉴스 클러스터링을 위한 문장 간 상호 작용 기반 문서 쌍 유사도 측정 모델들)

Choi, Seonghwan;Son, Donghyun;Lee, Hochang
- Annual Conference on Human and Language Technology
- /
- 2020.10a
- /
- pp.401-407
- /
- 2020
뉴스 클러스터링에서 두 문서 간의 유사도는 클러스터의 특성을 결정하는 중요한 부분 중 하나이다. 전통적인 단어 기반 접근 방법인 TF-IDF 벡터 유사도는 문서 간의 의미적인 유사도를 반영하지 못하고, 기존 딥러닝 기반 접근 방법인 시퀀스 유사도 측정 모델은 문서 단위에서 나타나는 긴 문맥을 반영하지 못하는 문제점을 가지고 있다. 이 논문에서 우리는 뉴스 클러스터링에 적합한 문서 쌍 유사도 모델을 구성하기 위하여 문서 쌍에서 생성되는 다수의 문장 표현들 간의 유사도 정보를 종합하여 전체 문서 쌍의 유사도를 측정하는 네 가지 유사도 모델을 제안하였다. 이 접근 방법들은 하나의 벡터로 전체 문서 표현을 압축하는 HAN (hierarchical attention network)와 같은 접근 방법에 비해 두 문서에서 나타나는 문장들 간의 직접적인 유사도를 통해서 전체 문서 쌍의 유사도를 추정한다. 그리고 기존 접근 방법들인 SVM과 HAN과 제안하는 네 가지 유사도 모델을 통해서 두 문서 쌍 간의 유사도 측정 실험을 하였고, 두 가지 접근 방법에서 기존 접근 방법들보다 높은 성능이 나타나는 것을 확인할 수 있었고, 그래프 기반 접근 방법과 유사한 성능을 보이지만 더 효율적으로 문서 유사도를 측정하는 것을 확인하였다.
PDF

A New Approach to Active Documents and its Application (능동문서에 대한 새로운 접근법과 그 응용)

남철기;배재학;장길상
- Journal of KIISE:Software and Applications
- /
- v.30 no.3_4
- /
- pp.347-357
- /
- 2003
The web is an important source of information and most of Web applications are based on form documents in HTML-based form documents only play a role as user interfaces, and they do not involve the procedures or rules if business process which form document designers assume. However, from documents imply methods for treating documents, and these embedded procedural knowledge can be utilized.actively in automation of business process. In this respect, we Investigate the activeness of documents with cognitive science to automate business processes based on from documents. Through this, we have a new concept and applicability of active documents. Our active documents include business rules and declarative knowledge to support the automation of document processing. Also, we propose a processing framework for the active documents. The framework has two phases: build-time and run-time. in order to demonstrate the usefulness of the proposed framework, a prototype called ActiveForm is designed and implemented for requisition processing them in an inference engine can enhance the intelligence of Internet applications.
PDF KSCI

Design of Security Mechanism for Electronic Document Repository System (전자문서 보관 시스템을 위한 보안 메커니즘 설계)

Kim, Jeom-Goo;Kim, Sang-Choon
- Convergence Security Journal
- /
- v.11 no.3
- /
- pp.99-111
- /
- 2011
The management and deposit of paper document costs are increased gradually. Specially, it is too expensive to safekeeping paper document in the warehouse. Also paper based document system is exposed in several security problems. Therefore, demands of transformation process from paper document into electronic ones are quietly needed. Electronic document repository system is one of the best solutions for solving paper based document system issues. Electronic document repository system can reduce overall costs and provides some advantages in comparison with paper based document system. But, electronic document repository system has no formal methodology for guarantee safeties. Therefore, we suggest a security mechanism for establish electronic document repository system. Suggested security methodology can help for design of more secure electronic document repository system.
PDF KSCI

Automatic Conversion of XML Documents to UML Class Diagram (XML문서에서 UML 클래스 다이어그램 자동 변환)

차남정;민미경;이숙희
- Proceedings of the Korea Multimedia Society Conference
- /
- 2002.11b
- /
- pp.368-372
- /
- 2002
XML 문서의 구조를 파악하기 위하여 현재 많은 연구가 진행 되고 있으며, 대부분 XML 문서의 구조를 찾아 DTD나 스키마로 표현하는데 중점을 두고 있다. 본 논문에서는 XML 문서에서 구조를 추출하여 이를 UML 클래스 다이어그램으로 자동 변환하는 시스템을 제안한다. 제안된 시스템에서는 XML 문서로부터 요소-속성 트리를 구성하고, 이를 활용하여 문서 구조를 UML 클래스 다이어그램으로 쉽게 변환하도록 한다.
PDF

확률 벡터를 사용한 전자 문서의 개념적 분류 기법

조완섭;김영렬;강원석;강현규
- Proceedings of the Korea Society for Industrial Systems Conference
- /
- 1997.11a
- /
- pp.53-62
- /
- 1997
본 논문에서는 전자문서의 개념적 분류기법을 제안한다. 기존의 문서분류는 대부분 문서에 나타난 용어를 기반으로 분류하므로 개념적인 분류가 불가능하다. 제안된 기법에서는 한국어 시소러스를 사용하여 문서에 나타난 용어 뿐 아니라 용어의 상하위 개념을 기준으로 문서를 분류할 수 있다. 특히, 제안된 방법은 확률 벡터를 사용하는 방식으로써 점진적인 학습이 가능하다는 장점도 가진다.
PDF

Search Result 7,095, Processing Time 0.029 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)