• Title/Summary/Keyword: Document Processing System

Search Result 398, Processing Time 0.029 seconds

An Automatic Extraction of English-Korean Bilingual Terms by Using Word-level Presumptive Alignment (단어 단위의 추정 정렬을 통한 영-한 대역어의 자동 추출)

  • Lee, Kong Joo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.2 no.6
    • /
    • pp.433-442
    • /
    • 2013
  • A set of bilingual terms is one of the most important factors in building language-related applications such as a machine translation system and a cross-lingual information system. In this paper, we introduce a new approach that automatically extracts candidates of English-Korean bilingual terms by using a bilingual parallel corpus and a basic English-Korean lexicon. This approach can be useful even though the size of the parallel corpus is small. A sentence alignment is achieved first for the document-level parallel corpus. We can align words between a pair of aligned sentences by referencing a basic bilingual lexicon. For unaligned words between a pair of aligned sentences, several assumptions are applied in order to align bilingual term candidates of two languages. A location of a sentence, a relation between words, and linguistic information between two languages are examples of the assumptions. An experimental result shows approximately 71.7% accuracy for the English-Korean bilingual term candidates which are automatically extracted from 1,000 bilingual parallel corpus.

Web Page Classification System based upon Ontology (온톨로지 기반의 웹 페이지 분류 시스템)

  • Choi Jaehyuk;Seo Haesung;Noh Sanguk;Choi Kyunghee;Jung Gihyun
    • The KIPS Transactions:PartB
    • /
    • v.11B no.6
    • /
    • pp.723-734
    • /
    • 2004
  • In this paper, we present an automated Web page classification system based upon ontology. As a first step, to identify the representative terms given a set of classes, we compute the product of term frequency and document frequency. Secondly, the information gain of each term prioritizes it based on the possibility of classification. We compile a pair of the terms selected and a web page classification into rules using machine learning algorithms. The compiled rules classify any Web page into categories defined on a domain ontology. In the experiments, 78 terms out of 240 terms were identified as representative features given a set of Web pages. The resulting accuracy of the classification was, on the average, 83.52%.

Traceability Management Technique for Software Artifacts which Comprise Software Release (소프트웨어 릴리스를 구성하는 산출물들의 추적성 관리 기법)

  • Kim, Dae Yeob;Youn, Cheong
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.2 no.7
    • /
    • pp.461-470
    • /
    • 2013
  • The capacity for tracing relationships among various artifacts which are created at each phase of software system development is essential for software quality management. Software release refers to delivering a set of newly created or changed artifacts to customers. The relationships among artifacts which comprise software release must be traced so that the work for customer's requirement of change and functional enhancement is effectively established. And release management can be effectively realized through the integration of configuration management and change management. This paper proposes the technique for supporting change management of artifacts and for tracing relationships of artifacts which comprise software release through the integrated environment of personal workspace and configuration management system. In the proposed environment, the visualized version graph and automated tagging function are used for tracing relationships of artifacts.

A feedback Scheme for Synchronization in a Distributed Multimedia (분산 멀티미디어 프리젠테이션 시스템에서 동기화를 위한 피드백 기법)

  • Choi, Sook-Young
    • The KIPS Transactions:PartB
    • /
    • v.9B no.1
    • /
    • pp.47-56
    • /
    • 2002
  • In the distributed multimedia document system, media objects distributed over a computer network are retrieved from their sources and presented to users according to specified temporal relations. For effective presentation, synchronization has to be supported. Furthermore, since the presentation in the distributed environment is influenced by the network bandwidth and delay, they should be considered for synchronization. This paper proposes a distributed multimedia presentation system that performs presentation effectively in the distributed environment. And it also suggests a method to supports synchronization, in which, network situation and resources are monitored when media objects are transferred from servers to a client. Then a feedback message for the change of them is sent to the server so that the server might adjust the data sending rate to control synchronization. To monitor the situation of network, we use two methods together. One is to manage the level of the buffer by setting thresholds on a buffer and the other is to check the difference between the sending time of a packet from the server and the arrival time of the packet to the client.

Development of Assistive Software for color blind to Electronic Documents (전자문서용 색각 장애 보정 소프트웨어 개발)

  • Jang, Young-Gun
    • The KIPS Transactions:PartB
    • /
    • v.10B no.5
    • /
    • pp.535-542
    • /
    • 2003
  • This study is concerned with an assistive technology which reduces color blinds´s confusion when they access electronic documents including color objects in their computers. In this study, 1 restrict the assistive technology would apply to windows operating system, 256 color mode and implement to minimize color distortion which occurs in multi window environments because of color approximation process. As a basic palette, I use a 216 colors web safe palette which the Christine proposed as a standard for color blind, expand it to 256 colors to apply all computer displays using Microsoft Windows as its operating system and implement it as windows application. To test its effectiveness, I use a simulator for dichromats, as results of the test, the developed color vision deficiency correction S/W is effective to reduce the confusion. It is more effective to use the implemented S/W in both of design and client process for electronic documents.

An Efficient Design Pattern Framework for Automatic Code Generation based on XML (코드 자동 생성을 위한 XML 기반의 효율적인 디자인패턴 구조)

  • Kim, Un-Yong;Kim, Yeong-Cheol;Ju, Bok-Gyu;Choe, Yeong-Geun
    • The KIPS Transactions:PartD
    • /
    • v.8D no.6
    • /
    • pp.753-760
    • /
    • 2001
  • Design Patterns are design knowledge for solving issues related to extensibility and maintainability which are independent from problems concerned by application, but despite vast interest in design pattern, the specification and application of patterns is generally assumed to rely on manual implementation. As a result, we need to spend a lot of time to develop software program not only because of being difficult to analyze and apply to a consistent pattern, but also because of happening the frequent programing faults. In this paper, we propose a notation using XML for describing design pattern and a framework using design pattern. We will also suggest a source code generation support system, and show a example of the application through this notation and the application framework. We may construct more stable system and be generated a compact source code to a user based on the application of structured documentations with XML.

  • PDF

The Design and Implementation of Web Agents for vCard Service in Mobile Enviromnent (모바일 환경에서 vCard 서비스를 위한 웹 에이전트의 설계 및 구현)

  • Yun, Se-Mi;Jo, Ik-Seong
    • The KIPS Transactions:PartD
    • /
    • v.9D no.3
    • /
    • pp.477-486
    • /
    • 2002
  • vCard that is the electronic business card automates the exchange of personal information typically found on a traditional business card. vCard information contains not only simple text, but also graphics and multimedia data like pictures, company logos, Web addresses, and so on. This paper describes the design and implementation of Web-based vCard agent system for exchanging vCard, an electronic business card and searching another user's vCard in mobile phone environment. In today's business environment, such as that this information is typically exchanged on business cards. Our web agent system in this paper connect web server which provide vCard service and search, edit vCard information displayed by web browser of mobile phone and exchange vCard with another user through internet. Considering characteristics of wireless devices that have limited storage space, It also saves constructed XML documents that include user's informations in a web server and solves the security problem by exchanging not personal data or XML but encrypted directory name where XML document exits as exchanging vcard.

Automatic Object Extraction from Electronic Documents Using Deep Neural Network (심층 신경망을 활용한 전자문서 내 객체의 자동 추출 방법 연구)

  • Jang, Heejin;Chae, Yeonghun;Lee, Sangwon;Jo, Jinyong
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.7 no.11
    • /
    • pp.411-418
    • /
    • 2018
  • With the proliferation of artificial intelligence technology, it is becoming important to obtain, store, and utilize scientific data in research and science sectors. A number of methods for extracting meaningful objects such as graphs and tables from research articles have been proposed to eventually obtain scientific data. Existing extraction methods using heuristic approaches are hardly applicable to electronic documents having heterogeneous manuscript formats because they are designed to work properly for some targeted manuscripts. This paper proposes a prototype of an object extraction system which exploits a recent deep-learning technology so as to overcome the inflexibility of the heuristic approaches. We implemented our trained model, based on the Faster R-CNN algorithm, using the Google TensorFlow Object Detection API and also composed an annotated data set from 100 research articles for training and evaluation. Finally, a performance evaluation shows that the proposed system outperforms a comparator adopting heuristic approaches by 5.2%.

Disease Prediction By Learning Clinical Concept Relations (딥러닝 기반 임상 관계 학습을 통한 질병 예측)

  • Jo, Seung-Hyeon;Lee, Kyung-Soon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.1
    • /
    • pp.35-40
    • /
    • 2022
  • In this paper, we propose a method of constructing clinical knowledge with clinical concept relations and predicting diseases based on a deep learning model to support clinical decision-making. Clinical terms in UMLS(Unified Medical Language System) and cancer-related medical knowledge are classified into five categories. Medical related documents in Wikipedia are extracted using the classified clinical terms. Clinical concept relations are established by matching the extracted medical related documents with the extracted clinical terms. After deep learning using clinical knowledge, a disease is predicted based on medical terms expressed in a query. Thereafter, medical terms related to the predicted disease are selected as an extended query for clinical document retrieval. To validate our method, we have experimented on TREC Clinical Decision Support (CDS) and TREC Precision Medicine (PM) test collections.

Methods for Integration of Documents using Hierarchical Structure based on the Formal Concept Analysis (FCA 기반 계층적 구조를 이용한 문서 통합 기법)

  • Kim, Tae-Hwan;Jeon, Ho-Cheol;Choi, Joong-Min
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.3
    • /
    • pp.63-77
    • /
    • 2011
  • The World Wide Web is a very large distributed digital information space. From its origins in 1991, the web has grown to encompass diverse information resources as personal home pasges, online digital libraries and virtual museums. Some estimates suggest that the web currently includes over 500 billion pages in the deep web. The ability to search and retrieve information from the web efficiently and effectively is an enabling technology for realizing its full potential. With powerful workstations and parallel processing technology, efficiency is not a bottleneck. In fact, some existing search tools sift through gigabyte.syze precompiled web indexes in a fraction of a second. But retrieval effectiveness is a different matter. Current search tools retrieve too many documents, of which only a small fraction are relevant to the user query. Furthermore, the most relevant documents do not nessarily appear at the top of the query output order. Also, current search tools can not retrieve the documents related with retrieved document from gigantic amount of documents. The most important problem for lots of current searching systems is to increase the quality of search. It means to provide related documents or decrease the number of unrelated documents as low as possible in the results of search. For this problem, CiteSeer proposed the ACI (Autonomous Citation Indexing) of the articles on the World Wide Web. A "citation index" indexes the links between articles that researchers make when they cite other articles. Citation indexes are very useful for a number of purposes, including literature search and analysis of the academic literature. For details of this work, references contained in academic articles are used to give credit to previous work in the literature and provide a link between the "citing" and "cited" articles. A citation index indexes the citations that an article makes, linking the articleswith the cited works. Citation indexes were originally designed mainly for information retrieval. The citation links allow navigating the literature in unique ways. Papers can be located independent of language, and words in thetitle, keywords or document. A citation index allows navigation backward in time (the list of cited articles) and forwardin time (which subsequent articles cite the current article?) But CiteSeer can not indexes the links between articles that researchers doesn't make. Because it indexes the links between articles that only researchers make when they cite other articles. Also, CiteSeer is not easy to scalability. Because CiteSeer can not indexes the links between articles that researchers doesn't make. All these problems make us orient for designing more effective search system. This paper shows a method that extracts subject and predicate per each sentence in documents. A document will be changed into the tabular form that extracted predicate checked value of possible subject and object. We make a hierarchical graph of a document using the table and then integrate graphs of documents. The graph of entire documents calculates the area of document as compared with integrated documents. We mark relation among the documents as compared with the area of documents. Also it proposes a method for structural integration of documents that retrieves documents from the graph. It makes that the user can find information easier. We compared the performance of the proposed approaches with lucene search engine using the formulas for ranking. As a result, the F.measure is about 60% and it is better as about 15%.