• Title/Summary/Keyword: Merging Documents

Search Result 14, Processing Time 0.038 seconds

Merging XML Documents Based on Insertion/Deletion Edit Operations (삽입/삭제 편집연산 기반의 XML 문서 병합)

  • Lee, Suk-Kyoon
    • The KIPS Transactions:PartD
    • /
    • v.16D no.4
    • /
    • pp.497-506
    • /
    • 2009
  • The method of effectively merging XML documents becomes necessary, as the use of XML is popular and the collaborative editing is required in the areas such as office documents and scientific documents editing works. As a solution to this problem, in this paper we present a theoretical framework for merging individual editing works by muli-users to a same source document. Different from existing approaches which merge documents themselves when they are merged, we represent editing works with a series of edit operations applied to a source document, which is called a edit script, merge those edit scripts by multi-users, and apply the merged one to the source document so that we can achieve the same effect of merging documents. In order to do this, assuming edit scripts based on insertion and deletion edit operations, we define notions such as static edit scripts, the intervention between edit scripts and the conflict between the ones, then propose the conflict conditions between edit scripts and the method of adjusting edit scripts when merged. This approach is effective in reducing network overhead in distributed environments and also in version management systems because of preserving the semantics of individual editing works.

Segmentation of region strings using connection-characteristic function (연결특성함수를 이용한 문서화상에서의 영역 분리와 문자열 추출)

  • 김석태;이대원;박찬용;남궁재찬
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.22 no.11
    • /
    • pp.2531-2542
    • /
    • 1997
  • This paper describes a method for region segmentation and string extractionin documents which are mixed with text, graphic and picture images by the use of the structural characteristic of connceted components. In segmentation of non-text regionas, with connection-characteristic functions which are made by structural characteristic of connected components, segmentation process is progressed. In the string extraction, first we organize basic-unit-region of which vertical and horizontal length are 1/4 of average length of connection components. Second, by merging the basic-unit-regions one other that have smaller values than a given connection intensity threshold. Third, by linking the word blocks with similar block anagles, initial strings are cresed. Finally the whold strings are generated by merging remaining word blocks whose angles are not decided, if their height and prosition are similar to the initial strings. This method can extract strings that are neither horizontal nor of various character sizes. Through computer exteriments with different style documents, we have shown that the feasibility of our method successes.

  • PDF

PIX: Partitioned Index for Keyword Search over XML Documents (PIX: XML문서 검색을 위한 색인 분할 기법)

  • Lee Hongrae;Lee Hyungdong;Yoo Sangwon;Kim Hyoung-Joo
    • Journal of KIISE:Databases
    • /
    • v.31 no.6
    • /
    • pp.710-720
    • /
    • 2004
  • As XML documents have much richer information than plain texts, we can perform very elaborated, fine-grained search which was difficult in past years. However, as the cost of finer grained element level search is very high, the processing overhead has become a new challenge. We propose an inverted index structure called PIX, which reduces the number of elements processed by partitioning elements according to their match potentiality. We choose a base level and partition elements according to whether they have possibility of having a common ancestor higher than the level. We also propose partition merging technique by which we can get same results as unpartitioned case. Our experimental results show that the index partitioning strategy can reduce processing time considerably.

Bilingual document analysis and character segmentation using connected components (연결요소를 이용한 한.영 혼용문서의 구조분석 및 낱자분리)

  • 김민기;권영빈;한상용
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.22 no.3
    • /
    • pp.410-422
    • /
    • 1997
  • In this paper, we descried a bottom-up document structure analysis method in bilingual Korean-English document. We proposed a character segmentation method based on the layout information of connected component of each character. In many researches, a document has been analyzed into text blocks and graphics. We analyzed a document into four parts: text, table, graphic, and separator. A text is recursively subdivided into text blocks, text lines, words, and characters. To extract the character in bilingual text, we proposed a new method of word of word separation of Korean or English. Futhermore, we used a character merging and segmentation method in accordance with the properties of Hangul on the Korean word blocks. Experimental results on the various documents show that the proposed method is very effectively operated on the document structure analysis and the character segmentation.

  • PDF

Analyzing the Issue Life Cycle by Mapping Inter-Period Issues (기간별 이슈 매핑을 통한 이슈 생명주기 분석 방법론)

  • Lim, Myungsu;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.4
    • /
    • pp.25-41
    • /
    • 2014
  • Recently, the number of social media users has increased rapidly because of the prevalence of smart devices. As a result, the amount of real-time data has been increasing exponentially, which, in turn, is generating more interest in using such data to create added value. For instance, several attempts are being made to analyze the relevant search keywords that are frequently used on new portal sites and the words that are regularly mentioned on various social media in order to identify social issues. The technique of "topic analysis" is employed in order to identify topics and themes from a large amount of text documents. As one of the most prevalent applications of topic analysis, the technique of issue tracking investigates changes in the social issues that are identified through topic analysis. Currently, traditional issue tracking is conducted by identifying the main topics of documents that cover an entire period at the same time and analyzing the occurrence of each topic by the period of occurrence. However, this traditional issue tracking approach has two limitations. First, when a new period is included, topic analysis must be repeated for all the documents of the entire period, rather than being conducted only on the new documents of the added period. This creates practical limitations in the form of significant time and cost burdens. Therefore, this traditional approach is difficult to apply in most applications that need to perform an analysis on the additional period. Second, the issue is not only generated and terminated constantly, but also one issue can sometimes be distributed into several issues or multiple issues can be integrated into one single issue. In other words, each issue is characterized by a life cycle that consists of the stages of creation, transition (merging and segmentation), and termination. The existing issue tracking methods do not address the connection and effect relationship between these issues. The purpose of this study is to overcome the two limitations of the existing issue tracking method, one being the limitation regarding the analysis method and the other being the limitation involving the lack of consideration of the changeability of the issues. Let us assume that we perform multiple topic analysis for each multiple period. Then it is essential to map issues of different periods in order to trace trend of issues. However, it is not easy to discover connection between issues of different periods because the issues derived for each period mutually contain heterogeneity. In this study, to overcome these limitations without having to analyze the entire period's documents simultaneously, the analysis can be performed independently for each period. In addition, we performed issue mapping to link the identified issues of each period. An integrated approach on each details period was presented, and the issue flow of the entire integrated period was depicted in this study. Thus, as the entire process of the issue life cycle, including the stages of creation, transition (merging and segmentation), and extinction, is identified and examined systematically, the changeability of the issues was analyzed in this study. The proposed methodology is highly efficient in terms of time and cost, as it sufficiently considered the changeability of the issues. Further, the results of this study can be used to adapt the methodology to a practical situation. By applying the proposed methodology to actual Internet news, the potential practical applications of the proposed methodology are analyzed. Consequently, the proposed methodology was able to extend the period of the analysis and it could follow the course of progress of each issue's life cycle. Further, this methodology can facilitate a clearer understanding of complex social phenomena using topic analysis.

Automatic Text Categorization Using Passage-based Weight Function and Passage Type (문단 단위 가중치 함수와 문단 타입을 이용한 문서 범주화)

  • Joo, Won-Kyun;Kim, Jin-Suk;Choi, Ki-Seok
    • The KIPS Transactions:PartB
    • /
    • v.12B no.6 s.102
    • /
    • pp.703-714
    • /
    • 2005
  • Researches in text categorization have been confined to whole-document-level classification, probably due to lacks of full-text test collections. However, full-length documents availably today in large quantities pose renewed interests in text classification. A document is usually written in an organized structure to present its main topic(s). This structure can be expressed as a sequence of sub-topic text blocks, or passages. In order to reflect the sub-topic structure of a document, we propose a new passage-level or passage-based text categorization model, which segments a test document into several Passages, assigns categories to each passage, and merges passage categories to document categories. Compared with traditional document-level categorization, two additional steps, passage splitting and category merging, are required in this model. By using four subsets of Routers text categorization test collection and a full-text test collection of which documents are varying from tens of kilobytes to hundreds, we evaluated the proposed model, especially the effectiveness of various passage types and the importance of passage location in category merging. Our results show simple windows are best for all test collections tested in these experiments. We also found that passages have different degrees of contribution to main topic(s), depending on their location in the test document.

FiST: XML Document Filtering by Sequencing Twig Patterns (가지형 패턴의 시퀀스화를 이용한 XML 문서 필터링)

  • Kwon Joon-Ho;Rao Praveen;Moon Bong-Ki;Lee Suk-Ho
    • Journal of KIISE:Databases
    • /
    • v.33 no.4
    • /
    • pp.423-436
    • /
    • 2006
  • In recent years, publish-subscribe (pub-sub) systems based on XML document filtering have received much attention. In a typical pub-sub system, subscribing users specify their interest in profiles expressed in the XPath language, and each new content is matched against the user profiles so that the content is delivered only to the interested subscribers. As the number of subscribed users and their profiles can grow very large, the scalability of the system is critical to the success of pub-sub services. In this paper, we propose a novel scalable filtering system called FiST(Filtering by Sequencing Twigs) that transforms twig patterns expressed in XPath and XML documents into sequences using Prufer's method. As a consequence, instead of matching linear paths of twig patterns individually and merging the matches during post-processing, FiST performs holistic matching of twig patterns with incoming documents. FiST organizes the sequences into a dynamic hash based index for efficient filtering. We demonstrate that our holistic matching approach yields lower filtering cost and good scalability under various situations.

Partitioning and Merging an Index for Efficient XML Keyword Search (효율적 XML키워드 검색을 인덱스 분할 및 합병)

  • Kim, Sung-Jin;Lee, Hyung-Dong;Kim, Hyoung-Joo
    • Journal of KIISE:Databases
    • /
    • v.33 no.7
    • /
    • pp.754-765
    • /
    • 2006
  • In XML keyword search, a search result is defined as a set of the smallest elements (i.e., least common ancestors) containing all query keywords and a granularity of indexing is an XML element instead of a document. Under the conventional index structure, all least common ancestors produced by the combination of the elements, each of which contains a query keyword, are considered as a search result. In this paper, to avoid unnecessary operations of producing the least common ancestors and reduce query process time, we describe a way to construct a partitioned index composed of several partitions and produce a search result by merging those partitions if necessary. When a search result is restricted to be composed of the least common ancestors whose depths are higher than a given minimum depth, under the proposed partitioned index structure, search systems can reduce the query process time by considering only combinations of the elements belonging to the same partition. Even though the minimum depth is not given or unknown, search systems can obtain a search result with the partitioned index, which requires the same query process time to obtain the search result with non-partitioned index. Our experiment was conducted with the XML documents provided by the DBLP site and INEX2003, and the partitioned index could reduce a substantial amount of query processing time when the minimum depth is given.

Current Circumstance and Issues in Interface between Western Medicine and Traditional Korean Medicine in Korea : What are Health Policy Options for a New Integrative Health System? (우리나라 양.한방 보건의료 부문간의 현황과 과제 : 새로운 의료체계로의 전환을 위한 공공정책의 선택)

  • Han, Dong-Woon;Yoon, Tae-Hyung
    • Journal of Society of Preventive Korean Medicine
    • /
    • v.9 no.2
    • /
    • pp.43-58
    • /
    • 2005
  • Internationally, many countries are facing the demand for reshaping health care systems to cope with rapid changing circumstances in health care sector. The recent growth of oriental medicine and complementary and alternative medicine (CAM) in the many countries is, to a large extent, due to the growth of the number of oriental medical doctors and physicians who have taken up alternative therapies alongside conventional medicine. To cope with the changing environments, many countries consider to develop integrative health care which is now used widely in health care sector. In both biomedical and CAM sectors(including oriental medicine), attention appears to have shifted away from separating therapeutic modalities into categories such as biomedical or CAM, towards a focus on merging diverse modalities into a 'new' integrative health system. In Korea, one of peculiar characteristics of health care system is that as health care provider, Hanbang medicine (traditional Korean medicine) and (western) medicine coexist since 19 century. Recently, the government of Korea has given many efforts to enhance the role and function of traditional Korean medicine in health care sector. However, the strategies and measures for integrative health care settings combining traditional Korean medicine and western medicine on health sector have not been developed yet. The research question of this study is In Korea, what are the trends and problems in interface of traditional Korean medical sector and Western medical sector; what are the causes of or associated factors to the problems; how to cope with the problems and how to resolve the causes?; what are the health policy directions and its strategies that the government should take to cope with the future demand and the burden on health care sector? In order to do this, this study explores the current situations and issues on the interface between traditional Korean medicine and (western) medicine in various ways using contents analysis of existing data and documents related to traditional Korean medicine and health policy. Finally, we discussed stakeholders' views on the interface in the health care sector. Then, health policy options to have shifted away from separating therapeutic modalities into categories such as 'traditional Korean medicine' or 'western medicine', towards a focus on merging diverse modalities into a 'new' integrative health system.

  • PDF

Sketch Map System using Clustering Method of XML Documents (XML 문서의 클러스터링 기법을 이용한 스케치맵 시스템)

  • Kim, Jung-Sook;Lee, Ya-Ri;Hong, Kyung-Pyo
    • The Journal of the Korea Contents Association
    • /
    • v.9 no.12
    • /
    • pp.19-30
    • /
    • 2009
  • The service that has recently come into the spotlight utilizes the map to first approach the map and then provide various mash-up formed results through the interface. This service can provide precise information to the users but the map is barely reusable. The sketch-map system of this paper, unlike the existing large map system, uses the method of presenting the specific spot and route in XML document and then clustering among sketch-maps. The map service system is designed to show the optimum route to the destination in a simple outline map. It is done by renovating the spot presented by the map into optimum contents. This service system, through the process of analyzing, splitting and clustering of the sketch-map's XML document input, creates a valid form of a sketch-map. It uses the LCS(Longest Common Subsequence) algorithm for splitting and merging sketch-map in the process of query. In addition, the simulation of this system's expected effects is provided. It shows how the maps that share information and knowledge assemble to form a large map and thus presents the system's ability and role as a new research portal.