• Title/Summary/Keyword: 문서지

Search Result 2,043, Processing Time 0.021 seconds

An Efficient Method of Document Store and Version Management for XML Repository System (XML 저장 관리 시스템에서 효율적인 버전 관리 및 문서 저장 방안)

  • Jung, Hyun-Joo;Kim, Kweon-Yang;Choi, Jae-Hyuk
    • The Journal of Korean Association of Computer Education
    • /
    • v.6 no.4
    • /
    • pp.11-21
    • /
    • 2003
  • In rapidly changing an information=oriented society, it is essential to control massive document information by electronic file. In relation to these electronic document, it is also important to keep and maintain all kinds of information without any losses. It should be allowed to trace previous contents as well as recently updated contents by controlling updated contents with version. For these, XML is recommendable. In this thesis, we intend to save the document storing space by saving only updated contents with version without saving whole documentation, when document is updated. In case of controlling the history of document update by version, we designed system so as to omit "JOIN operation" if document size is under a certainspecific size. Therefore, we implemented a new XML document repository system which is possible for quick search and efficient XML document saving by reducing perfomance deterioration caused by JOIN operation.

  • PDF

A Study on Focused Crawling of Web Document for Building of Ontology Instances (온톨로지 인스턴스 구축을 위한 주제 중심 웹문서 수집에 관한 연구)

  • Chang, Moon-Soo
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.18 no.1
    • /
    • pp.86-93
    • /
    • 2008
  • The construction of ontology defines as complicated semantic relations needs precise and expert skills. For the well defined ontology in real applications, plenty of information of instances for ontology classes is very critical. In this study, crawling algorithm which extracts the fittest topic from the Web overflowing over by a great number of documents has been focused and developed. Proposed crawling algorithm made a progress to gather documents at high speed by extracting topic-specific Link using URL patterns. And topic fitness of Link block text has been represented by fuzzy sets which will improve a precision of the focused crawler.

Web Site Construction Using Internet Information Extraction (인터넷 정보 추출을 이용한 웹문서 구조화)

A Text Categorization Method Improved by Removing Noisy Training Documents (오류 학습 문서 제거를 통한 문서 범주화 기법의 성능 향상)

  • Han, Hyoung-Dong;Ko, Young-Joong;Seo, Jung-Yun
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.9
    • /
    • pp.912-919
    • /
    • 2005
  • When we apply binary classification to multi-class classification for text categorization, we use the One-Against-All method generally, However, this One-Against-All method has a problem. That is, documents of a negative set are not labeled by human. Thus, they can include many noisy documents in the training data. In this paper, we propose that the Sliding Window technique and the EM algorithm are applied to binary text classification for solving this problem. We here improve binary text classification through extracting noise documents from the training data by the Sliding Window technique and re-assigning categories of these documents using the EM algorithm.

Medicine Ontology Building based on Semantic Relation and Its Application (의미관계 정보를 이용한 약품 온톨로지의 구축과 활용)

  • Lim Soo-Yeon;Park Seong-Bae;Lee Sang-Jo
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.5
    • /
    • pp.428-437
    • /
    • 2005
  • An ontology consists of a set and definition of concepts that represents the characteristics of a given domain and relationship between the elements. To reduce time-consuming and cost in building ontology, this paper proposes a semiautomatic method to build a domain ontology using the results of text analysis. To do this, we Propose a terminology processing method and use the extracted concepts and semantic relations between them to build ontology. An experiment domain is selected by the pharmacy field and the built ontology is applied to document retrieval. In order to represent usefulness for retrieving a document using the hierarchical relations in ontology, we compared a typical keyword based retrieval method with an ontology based retrieval method, which uses related information in an ontology for a related feedback. As a result, the latter shows the improvement of precision and recall by $4.97\%$ and $0.78\%$ respectively.

Automatic Text Categorization using the Importance of Sentences (문장 중요도를 이용한 자동 문서 범주화)

  • Ko, Young-Joong;Park, Jin-Woo;Seo, Jung-Yun
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.6
    • /
    • pp.417-424
    • /
    • 2002
  • Automatic text categorization is a problem of assigning predefined categories to free text documents. In order to classify text documents, we have to extract good features from them. In previous researches, a text document is commonly represented by the frequency of each feature. But there is a difference between important and unimportant sentences in a text document. It has an effect on the importance of features in a text document. In this paper, we measure the importance of sentences in a text document using text summarizing techniques. A text document is represented by features with different weights according to the importance of each sentence. To verify the new method, we constructed Korean news group data set and experiment our method using it. We found that our new method gale a significant improvement over a basis system for our data sets.

An Effective Increment리 Content Clustering Method for the Large Documents in U-learning Environment (U-learning 환경의 대용량 학습문서 판리를 위한 효율적인 점진적 문서)

  • Joo, Kil-Hong;Choi, Jin-Tak
    • Journal of the Korea Computer Industry Society
    • /
    • v.5 no.9
    • /
    • pp.859-872
    • /
    • 2004
  • With the rapid advance of computer and communication techonology, the recent trend of education environment is edveloping in the ubiquitous learning (u-learning) direction that learners select and organize the contents, time and order of learning by themselves. Since the amount of education information through the internet is increasing rapidly and it is managed in document in an effective way is necessary. The document clustering is integrated documents to subject by classifying a set of documents through their similarity among them. Accordingly, the document clustering can be used in exploring and searching a document and it can increased accuracy of search. This paper proposes an efficient incremental clustering method for a set of documents increase gradually. The incremental document clustering algorithm assigns a set of new documents to the legacy clusters which have been identified in advance. In addition, to improve the correctness of the clustering, removing the stop words can be proposed.

  • PDF

An Efficient and Secure Method for Managing Logs of Certified e-Document Authority Using Hash Tree (공인전자문서 보관소에서 생성되는 로그의 효율적이고 안전한 보관방법에 대한 연구)

  • Kang, Shin-Myung;Moon, Jong-Sub
    • Convergence Security Journal
    • /
    • v.9 no.2
    • /
    • pp.23-32
    • /
    • 2009
  • CeDA (Certified e-Document Authority) was adopted in March 2005. It is possible to register/store/send/receive/transfer/revoke e-documents by using trusted third party, CeDA. It is important to store not only e-documents of users but also logs produced by CeDA. Thus all logs must be electronically signed using certificate of CeDA. But management of electronically signed logs is difficult. In this paper, the method which can be applicable to authenticate all logs of CeDA using "Hash Tree" is present.

  • PDF

A Unification Algorithm for DTDs of XML Documents having a Similar Structure (유사 구조를 가지는 XML 문서들의 DTD 통합 알고리즘)

  • 유춘식;우선미;김용성
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.10
    • /
    • pp.1400-1411
    • /
    • 2004
  • There are many cases that many XML documents have different DTDs in spite of having a similar structure and being logically same kind of document. For this reason, It occurs a problem that these XML documents have different database schema and are stored in different databases. So, in this paper, we propose an algorithm that unifies DTDs of these XML documents using the finite automata and the tree structure. The finite automata is suitable for representing repetition operators and connectors of DTD, and is a simple representation method for DTD. By using the finite automata, we are able to reduce the complexity of algorithm. And we apply a proposed algorithm to unify DTDs of science journals.

An Efficient Parallel Information Retrieval System using Document Clustering (문서 클러스터링에 의한 효율적인 병렬 정보검색 시스템)

  • Gang, Yu-Gyeong;Ryu, Gwang-Ryeol;Jeong, Sang-Hwa
    • Journal of KIISE:Software and Applications
    • /
    • v.28 no.2
    • /
    • pp.157-167
    • /
    • 2001
  • 본 논문은 고품질의 정보를 신속하게 제공할 수 있으면서 가격대 성능비가 우수한 병렬 정보 검색 시스템을 제시하고 있다. 본 검색 시스템은 문서 라이브러리를 여러 개의 클러스터로 세분화하고 검색 시 클러스터 단위로 프로세서에 할당함으로써 작업 단위를 적절한 규모로 하였을 뿐만 아니라, 문서의 점수 계산 시 프로세서 간 통신이 전혀 필요치 않게 하였다. 검색은 1차로 클러스터 레벨에서 관련 클러스터들을 찾는 것으로 시작하여 2차로 관련 클러스터 내에서 실제 문서를 찾는 방식으로 이루어진다. 이러한 계층적인 검색 구조로 인하여 1차 검색 후 여과가 가능하므로 전체적인 검색의 부하를 줄일 수 있다. 또한 문서의 클러스터가 가능한 한 유사한 문서군이 되도록 함으로써 불필요한 클러스터가 검색될 가능성을 최소화하여 성능을 높였다. 본 검색 시스템은 분산메모리 MIMD 구조의 다중 트랜스퓨터 시스템에서 구현되었으며, 실험 결과 무작위적으로 클러스터링한 경우에 비해 유사 문서군으로 클러스터링한 접근 방법이 우수함을 확인하였다.

  • PDF