• Title/Summary/Keyword: 문서 병합

Search Result 40, Processing Time 0.025 seconds

Sketch Map System using Clustering Method of XML Documents (XML 문서의 클러스터링 기법을 이용한 스케치맵 시스템)

  • Kim, Jung-Sook;Lee, Ya-Ri;Hong, Kyung-Pyo
    • The Journal of the Korea Contents Association
    • /
    • v.9 no.12
    • /
    • pp.19-30
    • /
    • 2009
  • The service that has recently come into the spotlight utilizes the map to first approach the map and then provide various mash-up formed results through the interface. This service can provide precise information to the users but the map is barely reusable. The sketch-map system of this paper, unlike the existing large map system, uses the method of presenting the specific spot and route in XML document and then clustering among sketch-maps. The map service system is designed to show the optimum route to the destination in a simple outline map. It is done by renovating the spot presented by the map into optimum contents. This service system, through the process of analyzing, splitting and clustering of the sketch-map's XML document input, creates a valid form of a sketch-map. It uses the LCS(Longest Common Subsequence) algorithm for splitting and merging sketch-map in the process of query. In addition, the simulation of this system's expected effects is provided. It shows how the maps that share information and knowledge assemble to form a large map and thus presents the system's ability and role as a new research portal.

An Operation - Based Version Model for Softwore Diagrams (소프트웨어 다이어그램을 위한 오퍼레이션 기반 버전 모델)

  • No, Jeong-Gyu;U, Chi-Su
    • Journal of KIISE:Software and Applications
    • /
    • v.26 no.4
    • /
    • pp.521-532
    • /
    • 1999
  • 소프트웨어 시스템의 설계를 나타내기 위하여 여러 가지 종류의 다이어그램이 사용된다. 다이어그램은 다른 설계 문서나 원시 코드처럼 설계 과정에서 여러 버전이 생성된다. 그러나 기존의 소프트웨어 형상관리 도구나 객체 버전을 지원하는 객체지향 데이터베이스 관리 시스템은 작은 단위 다이어그램의 버전관리에는 적합하지 못하다. 본 연구에서는 작은 단위 소프트웨어 다이어그램을 위한 오퍼레이션기반 버전모델을 제안한다. 이 모델은 다이어그램이 소프트웨어 설계 정보를 나타내기 위하여 그래픽 정보를 수단으로 사용하고 있다는 점과 다이어그램의 구조가 노드와 에지로 이루어져 있다는 점을 반영한다. 다이어그램의 버전은 오퍼레이션 델타와 객체 가시성을 이용하여 효율적으로 저장되고 검색된다. 본 연구에서는 다이어그램의 두 버전을 병합하는 방법도 제시한다.

A Study on the Thesaurus Construction Using the Topic Map (토픽맵을 이용한 시소러스의 구조화 연구)

  • Nam, Young-Joon
    • Journal of the Korean Society for information Management
    • /
    • v.22 no.3 s.57
    • /
    • pp.37-53
    • /
    • 2005
  • The terminology management is absolutely necessary for maintaining the efficiency of thesaurus. This is because the creating, differentiating, disappearing, and other processes of the descriptor become accomplished dynamically, making effective management of thesaurus a very difficult task. Therefore, a device is required for accomplishing methods to construct and maintain the thesaurus. This study proposes the methods to construct the thesaurus management using the basic elements of a topic map which are topic, occurrence, and association. Second, the study proposes the methods to represent the basic and specific instances using the systematic mapping algorithm and merging algorithm. Also, using a hub document as a standard, this study gives the methods to expand and subsitute the descriptors using the topic type. The new method applying fixed concept for double layer management on terms is developed, too. The purpose of this method is to fix the conceptual term which represents independent concept of time and space, and to select the descriptor freely by external information circumstance.

Reinforcement Post-Processing and Feedback Algorithm for Optimal Combination in Bottom-Up Hierarchical Classification (상향식 계층분류의 최적화 된 병합을 위한 후처리분석과 피드백 알고리즘)

  • Choi, Yun-Jeong;Park, Seung-Soo
    • The KIPS Transactions:PartB
    • /
    • v.17B no.2
    • /
    • pp.139-148
    • /
    • 2010
  • This paper shows a reinforcement post-processing method and feedback algorithm for improvement of assigning method in classification. Especially, we focused on complex documents that are generally considered to be hard to classify. A basis factors in traditional classification system are training methodology, classification models and features of documents. The classification problem of the documents containing shared features and multiple meanings, should be deeply mined or analyzed than general formatted data. To address the problems of these document, we proposed a method to expand classification scheme using decision boundary detected automatically in our previous studies. The assigning method that a document simply decides to the top ranked category, is a main factor that we focus on. In this paper, we propose a post-processing method and feedback algorithm to analyze the relevance of ranked list. In experiments, we applied our post-processing method and one time feedback algorithm to complex documents. The experimental results show that our system does not need to change the classification algorithm itself to improve the accuracy and flexibility.

News Topic Extraction based on Word Similarity (단어 유사도를 이용한 뉴스 토픽 추출)

  • Jin, Dongxu;Lee, Soowon
    • Journal of KIISE
    • /
    • v.44 no.11
    • /
    • pp.1138-1148
    • /
    • 2017
  • Topic extraction is a technology that automatically extracts a set of topics from a set of documents, and this has been a major research topic in the area of natural language processing. Representative topic extraction methods include Latent Dirichlet Allocation (LDA) and word clustering-based methods. However, there are problems with these methods, such as repeated topics and mixed topics. The problem of repeated topics is one in which a specific topic is extracted as several topics, while the problem of mixed topic is one in which several topics are mixed in a single extracted topic. To solve these problems, this study proposes a method to extract topics using an LDA that is robust against the problem of repeated topic, going through the steps of separating and merging the topics using the similarity between words to correct the extracted topics. As a result of the experiment, the proposed method showed better performance than the conventional LDA method.

A Content based Web Image Retrieval System using MPEG-7 Visual Descriptors and Textual Information (MPEG-7 시각 정보 기술자와 텍스트 정보를 이용한 내용 기반 웹 이미지 검색 시스템)

  • Park Joo-Hyoun;Nang Jong-Ho
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2006.06a
    • /
    • pp.232-234
    • /
    • 2006
  • 인터넷 기술의 발달과 디지털 카메라와 같은 디지털 미디어 생산 장비의 발달로 WWW에 이미지 데이터의 양이 급격하게 늘어나면서 웹 이미지에 대한 효율적인 검색에 대한 요구가 증가하고 있다. 본 논문에서는 사용자의 다양한 검색 요구를 만족시킬 수 있도록 기존의 텍스트 기반의 검색과 시각 정보 기반의 검색을 병합하여 수행할 수 있는 웹 이미지 검색 시스템을 설계하고 구현한다. 제안한 웹 이미지 검색 시스템은 웹 이미지 수집 및 검색정보 추출 도구. 검색 서버. 그리고 검색 클라이언트로 구성된다. 웹 이미지 수집 및 검색 정보 추출 도구는 웹에서 이미지를 수집하여 이미지가 속해있는 웹 문서 구조를 이용하여 적절한 키워드를 선택하며 시각 정보 기반의 검색을 지원하기 위해 MPEG-7 시각 정보 기술자(1)를 추출한다. 빠른 검색을 위해 추출된 텍스트 정보는 상용 데이터베이스에 저장되며 MPEG-7 시각 정보 기술자는 고차원 데이터 색인 방법인 HBI (Hierarchical Bitmap Index)(2)를 사용하여 색인 정보를 만들어 사용한다. 검색 클라이언트는 사용자가 각 검색 요소에 가중치를 부여하여 검색 할 수 있도록 하며 원하는 검색 결과를 얻을 때까지 반복하여 검색할 수 있는 연관 피드백 과정도 포함한다.

  • PDF

Extracting curved text lines using the chain composition and the expanded grouping method (체인 정합과 확장된 그룹핑 방법을 사용한 곡선형 텍스트 라인 추출)

  • Bai, Nguyen Noi;Yoon, Jin-Seon;Song, Young-Jun;Kim, Nam;Kim, Yong-Gi
    • The KIPS Transactions:PartB
    • /
    • v.14B no.6
    • /
    • pp.453-460
    • /
    • 2007
  • In this paper, we present a method to extract the text lines in poorly structured documents. The text lines may have different orientations, considerably curved shapes, and there are possibly a few wide inter-word gaps in a text line. Those text lines can be found in posters, blocks of addresses, artistic documents. Our method based on the traditional perceptual grouping but we develop novel solutions to overcome the problems of insufficient seed points and vaned orientations un a single line. In this paper, we assume that text lines contained tone connected components, in which each connected components is a set of black pixels within a letter, or some touched letters. In our scheme, the connected components closer than an iteratively incremented threshold will make together a chain. Elongate chains are identified as the seed chains of lines. Then the seed chains are extended to the left and the right regarding the local orientations. The local orientations will be reevaluated at each side of the chains when it is extended. By this process, all text lines are finally constructed. The proposed method is good for extraction of the considerably curved text lines from logos and slogans in our experiment; 98% and 94% for the straight-line extraction and the curved-line extraction, respectively.

Analyzing the Issue Life Cycle by Mapping Inter-Period Issues (기간별 이슈 매핑을 통한 이슈 생명주기 분석 방법론)

  • Lim, Myungsu;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.4
    • /
    • pp.25-41
    • /
    • 2014
  • Recently, the number of social media users has increased rapidly because of the prevalence of smart devices. As a result, the amount of real-time data has been increasing exponentially, which, in turn, is generating more interest in using such data to create added value. For instance, several attempts are being made to analyze the relevant search keywords that are frequently used on new portal sites and the words that are regularly mentioned on various social media in order to identify social issues. The technique of "topic analysis" is employed in order to identify topics and themes from a large amount of text documents. As one of the most prevalent applications of topic analysis, the technique of issue tracking investigates changes in the social issues that are identified through topic analysis. Currently, traditional issue tracking is conducted by identifying the main topics of documents that cover an entire period at the same time and analyzing the occurrence of each topic by the period of occurrence. However, this traditional issue tracking approach has two limitations. First, when a new period is included, topic analysis must be repeated for all the documents of the entire period, rather than being conducted only on the new documents of the added period. This creates practical limitations in the form of significant time and cost burdens. Therefore, this traditional approach is difficult to apply in most applications that need to perform an analysis on the additional period. Second, the issue is not only generated and terminated constantly, but also one issue can sometimes be distributed into several issues or multiple issues can be integrated into one single issue. In other words, each issue is characterized by a life cycle that consists of the stages of creation, transition (merging and segmentation), and termination. The existing issue tracking methods do not address the connection and effect relationship between these issues. The purpose of this study is to overcome the two limitations of the existing issue tracking method, one being the limitation regarding the analysis method and the other being the limitation involving the lack of consideration of the changeability of the issues. Let us assume that we perform multiple topic analysis for each multiple period. Then it is essential to map issues of different periods in order to trace trend of issues. However, it is not easy to discover connection between issues of different periods because the issues derived for each period mutually contain heterogeneity. In this study, to overcome these limitations without having to analyze the entire period's documents simultaneously, the analysis can be performed independently for each period. In addition, we performed issue mapping to link the identified issues of each period. An integrated approach on each details period was presented, and the issue flow of the entire integrated period was depicted in this study. Thus, as the entire process of the issue life cycle, including the stages of creation, transition (merging and segmentation), and extinction, is identified and examined systematically, the changeability of the issues was analyzed in this study. The proposed methodology is highly efficient in terms of time and cost, as it sufficiently considered the changeability of the issues. Further, the results of this study can be used to adapt the methodology to a practical situation. By applying the proposed methodology to actual Internet news, the potential practical applications of the proposed methodology are analyzed. Consequently, the proposed methodology was able to extend the period of the analysis and it could follow the course of progress of each issue's life cycle. Further, this methodology can facilitate a clearer understanding of complex social phenomena using topic analysis.

Design and Implementation of Distributed Collaborative Writing System for Engineering Design Process (다자간 협동 공학설계를 위한 DCWA 시스템의 설계 및 구현)

  • 이병걸
    • Journal of Korea Multimedia Society
    • /
    • v.3 no.1
    • /
    • pp.63-76
    • /
    • 2000
  • Most work done in CSCW (Computer Supported Collaborative Work) system has been targeted toward supporting the exchange of documents or messages among group members, and yet support for cognitive aspects such as group organization, division and merge of work, and work flow control. The objective of the study is to provide CSCW environment for the engineering design process such as CAD (Computer Aided Design) and CASE (Computer Aided Software Engineering). The proposed DCWA (Distributed Collaborative Writing Aid) system suggests a mechanism that unifies the group organization, work division, and work flow control in the CAO, CASE, and software simulation tool. In particular, CAD relates the group and work partition by providing expressing the relation of drawing object (e.g., binding, attachment, and proportional scaling) which is owned by different members of group, and CASE combining with the simulation tool supports the flexibility of the work flow control. Simulating the prototype before manu-facturing a product can reduce time and cost in development.

  • PDF

Investigating Dynamic Mutation Process of Issues Using Unstructured Text Analysis (비정형 텍스트 분석을 활용한 이슈의 동적 변이과정 고찰)

  • Lim, Myungsu;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.1-18
    • /
    • 2016
  • Owing to the extensive use of Web media and the development of the IT industry, a large amount of data has been generated, shared, and stored. Nowadays, various types of unstructured data such as image, sound, video, and text are distributed through Web media. Therefore, many attempts have been made in recent years to discover new value through an analysis of these unstructured data. Among these types of unstructured data, text is recognized as the most representative method for users to express and share their opinions on the Web. In this sense, demand for obtaining new insights through text analysis is steadily increasing. Accordingly, text mining is increasingly being used for different purposes in various fields. In particular, issue tracking is being widely studied not only in the academic world but also in industries because it can be used to extract various issues from text such as news, (SocialNetworkServices) to analyze the trends of these issues. Conventionally, issue tracking is used to identify major issues sustained over a long period of time through topic modeling and to analyze the detailed distribution of documents involved in each issue. However, because conventional issue tracking assumes that the content composing each issue does not change throughout the entire tracking period, it cannot represent the dynamic mutation process of detailed issues that can be created, merged, divided, and deleted between these periods. Moreover, because only keywords that appear consistently throughout the entire period can be derived as issue keywords, concrete issue keywords such as "nuclear test" and "separated families" may be concealed by more general issue keywords such as "North Korea" in an analysis over a long period of time. This implies that many meaningful but short-lived issues cannot be discovered by conventional issue tracking. Note that detailed keywords are preferable to general keywords because the former can be clues for providing actionable strategies. To overcome these limitations, we performed an independent analysis on the documents of each detailed period. We generated an issue flow diagram based on the similarity of each issue between two consecutive periods. The issue transition pattern among categories was analyzed by using the category information of each document. In this study, we then applied the proposed methodology to a real case of 53,739 news articles. We derived an issue flow diagram from the articles. We then proposed the following useful application scenarios for the issue flow diagram presented in the experiment section. First, we can identify an issue that actively appears during a certain period and promptly disappears in the next period. Second, the preceding and following issues of a particular issue can be easily discovered from the issue flow diagram. This implies that our methodology can be used to discover the association between inter-period issues. Finally, an interesting pattern of one-way and two-way transitions was discovered by analyzing the transition patterns of issues through category analysis. Thus, we discovered that a pair of mutually similar categories induces two-way transitions. In contrast, one-way transitions can be recognized as an indicator that issues in a certain category tend to be influenced by other issues in another category. For practical application of the proposed methodology, high-quality word and stop word dictionaries need to be constructed. In addition, not only the number of documents but also additional meta-information such as the read counts, written time, and comments of documents should be analyzed. A rigorous performance evaluation or validation of the proposed methodology should be performed in future works.