• Title/Summary/Keyword: Patent document processing

Search Result 6, Processing Time 0.022 seconds

A Study of Patent Document Processing by SGML (SGML을 이용한 특허정보처리 연구)

  • Kwon, Young-Sook
    • Journal of Information Management
    • /
    • v.30 no.3
    • /
    • pp.44-54
    • /
    • 1999
  • A description of SGML(Standard Generalized Markup Language) is given together with a detailed description of WIPO Standard ST.32. The benefits of the use of SGML are highlighted-its system Independence and flexibility in building publication systems and full-text databases. A structure of WIPO Standard ST,32 based patent content is defined by DTD(document type definition) written in ST.32, and full-text itself is described with generalized markup depending on DTD. This article explains how to represent a document structure : a hierarchy structure like a entire document, a specific, sub-document, a paragraph, or non-hirarchy structure like a table drawings, or chemical structures. Merits of SGML In patent document processing are also discussed.

  • PDF

Patent Document Similarity Based on Image Analysis Using the SIFT-Algorithm and OCR-Text

  • Park, Jeong Beom;Mandl, Thomas;Kim, Do Wan
    • International Journal of Contents
    • /
    • v.13 no.4
    • /
    • pp.70-79
    • /
    • 2017
  • Images are an important element in patents and many experts use images to analyze a patent or to check differences between patents. However, there is little research on image analysis for patents partly because image processing is an advanced technology and typically patent images consist of visual parts as well as of text and numbers. This study suggests two methods for using image processing; the Scale Invariant Feature Transform(SIFT) algorithm and Optical Character Recognition(OCR). The first method which works with SIFT uses image feature points. Through feature matching, it can be applied to calculate the similarity between documents containing these images. And in the second method, OCR is used to extract text from the images. By using numbers which are extracted from an image, it is possible to extract the corresponding related text within the text passages. Subsequently, document similarity can be calculated based on the extracted text. Through comparing the suggested methods and an existing method based only on text for calculating the similarity, the feasibility is achieved. Additionally, the correlation between both the similarity measures is low which shows that they capture different aspects of the patent content.

Korean Machine Reading Comprehension for Patent Consultation Using BERT (BERT를 이용한 한국어 특허상담 기계독해)

  • Min, Jae-Ok;Park, Jin-Woo;Jo, Yu-Jeong;Lee, Bong-Gun
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.9 no.4
    • /
    • pp.145-152
    • /
    • 2020
  • MRC (Machine reading comprehension) is the AI NLP task that predict the answer for user's query by understanding of the relevant document and which can be used in automated consult services such as chatbots. Recently, the BERT (Pre-training of Deep Bidirectional Transformers for Language Understanding) model, which shows high performance in various fields of natural language processing, have two phases. First phase is Pre-training the big data of each domain. And second phase is fine-tuning the model for solving each NLP tasks as a prediction. In this paper, we have made the Patent MRC dataset and shown that how to build the patent consultation training data for MRC task. And we propose the method to improve the performance of the MRC task using the Pre-trained Patent-BERT model by the patent consultation corpus and the language processing algorithm suitable for the machine learning of the patent counseling data. As a result of experiment, we show that the performance of the method proposed in this paper is improved to answer the patent counseling query.

Design and implementation of an XML Repository System supporting Document Version (버전을 지원하는 XML 저장관리 시스템 설계 및 구현)

  • Son, Chung-Beom;Oh, Kyoung-Keun;Yoo, Jae-Soo
    • The KIPS Transactions:PartD
    • /
    • v.10D no.1
    • /
    • pp.13-22
    • /
    • 2003
  • Recently, as the Importance of the management on internet documents has highly increased, the research of an XML repository system has been actively made to store, retrieve and manage large XML documents. The version management for XML documents is required in the XML applications such as patent documents, software design and system manual that the modified documents have to be managed. In this paper, we propose a data model based on a fragmentation model that supports document versioning. We also design and implement an XML repository system supporting document versioning. It is shown through Performance evaluation that our system outperforms the existing repository system.

Patent Document Classification by Using Hierarchical Attention Network (계층적 주의 네트워크를 활용한 특허 문서 분류)

  • Jang, Hyuncheol;Han, Donghee;Ryu, Teaseon;Jang, Hyungkuk;Lim, HeuiSeok
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2018.05a
    • /
    • pp.369-372
    • /
    • 2018
  • 최근 지식경영에 있어 특허를 통한 지식재산권 확보는 기업 운영에 큰 영향을 주는 요소이다. 성공적인 특허 확보를 위해서, 먼저 변화하는 특허 분류 제계를 이해하고, 방대한 특허 정보 데이터를 빠르고 신속하게 특허 분류 체계에 따라 분류화 시킬 필요가 있다. 본 연구에서는 머신 러닝 기술 중에서도 계층적 주의 네트워크를 활용하여 특허 자료의 초록을 학습시켜 분류를 할 수 있는 방법을 제안한다. 그리고 본 연구에서는 제안된 계층적 주의 네트워크의 성능을 검증하기 위해 수정된 입력데이터와 다른 워드 임베딩을 활용하여 진행하였다. 이를 통하여 특허 문서 분류에 활용하려는 계층적 주의 네트워크의 성능과 특허 문서 분류 활용화 방안을 보여주고자 한다. 본 연구의 결과는 많은 기업 지식경영에서 실용적으로 활용할 수 있도록 지식경영 연구자, 기업의 관리자 및 실무자에게 유용한 특허분류기법에 관한 이론적 실무적 활용 방안을 제시한다.

A Study on Ontology and Topic Modeling-based Multi-dimensional Knowledge Map Services (온톨로지와 토픽모델링 기반 다차원 연계 지식맵 서비스 연구)

  • Jeong, Hanjo
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.4
    • /
    • pp.79-92
    • /
    • 2015
  • Knowledge map is widely used to represent knowledge in many domains. This paper presents a method of integrating the national R&D data and assists of users to navigate the integrated data via using a knowledge map service. The knowledge map service is built by using a lightweight ontology and a topic modeling method. The national R&D data is integrated with the research project as its center, i.e., the other R&D data such as research papers, patents, and reports are connected with the research project as its outputs. The lightweight ontology is used to represent the simple relationships between the integrated data such as project-outputs relationships, document-author relationships, and document-topic relationships. Knowledge map enables us to infer further relationships such as co-author and co-topic relationships. To extract the relationships between the integrated data, a Relational Data-to-Triples transformer is implemented. Also, a topic modeling approach is introduced to extract the document-topic relationships. A triple store is used to manage and process the ontology data while preserving the network characteristics of knowledge map service. Knowledge map can be divided into two types: one is a knowledge map used in the area of knowledge management to store, manage and process the organizations' data as knowledge, the other is a knowledge map for analyzing and representing knowledge extracted from the science & technology documents. This research focuses on the latter one. In this research, a knowledge map service is introduced for integrating the national R&D data obtained from National Digital Science Library (NDSL) and National Science & Technology Information Service (NTIS), which are two major repository and service of national R&D data servicing in Korea. A lightweight ontology is used to design and build a knowledge map. Using the lightweight ontology enables us to represent and process knowledge as a simple network and it fits in with the knowledge navigation and visualization characteristics of the knowledge map. The lightweight ontology is used to represent the entities and their relationships in the knowledge maps, and an ontology repository is created to store and process the ontology. In the ontologies, researchers are implicitly connected by the national R&D data as the author relationships and the performer relationships. A knowledge map for displaying researchers' network is created, and the researchers' network is created by the co-authoring relationships of the national R&D documents and the co-participation relationships of the national R&D projects. To sum up, a knowledge map-service system based on topic modeling and ontology is introduced for processing knowledge about the national R&D data such as research projects, papers, patent, project reports, and Global Trends Briefing (GTB) data. The system has goals 1) to integrate the national R&D data obtained from NDSL and NTIS, 2) to provide a semantic & topic based information search on the integrated data, and 3) to provide a knowledge map services based on the semantic analysis and knowledge processing. The S&T information such as research papers, research reports, patents and GTB are daily updated from NDSL, and the R&D projects information including their participants and output information are updated from the NTIS. The S&T information and the national R&D information are obtained and integrated to the integrated database. Knowledge base is constructed by transforming the relational data into triples referencing R&D ontology. In addition, a topic modeling method is employed to extract the relationships between the S&T documents and topic keyword/s representing the documents. The topic modeling approach enables us to extract the relationships and topic keyword/s based on the semantics, not based on the simple keyword/s. Lastly, we show an experiment on the construction of the integrated knowledge base using the lightweight ontology and topic modeling, and the knowledge map services created based on the knowledge base are also introduced.